Closed chassenr closed 3 years ago
Great that you could get the custom taxonomy running. Thank you for reporting this new issue. Can you upload the example?
I am using files that are about 500MB for the test that I mentioned. I would put them on a cloud and share the link here, or is there a better option to share files of that size?
Here is the nextcloud link: https://cloud.marum.de/s/F4yJBnjdQ7grEad. I included in the input fasta (library.fna), the mapping files (map.txt), the taxdump folder (taxonomy), and the temporary data generated by conterminator (tmp). The input files are formatted to be used with kraken2 to build a databases, i.e. I am planning to run conterminator just before the kraken2-build command to identify contaminated contigs. I hope that the kraken2 requirements for the fasta header are not causing the error. As this is only a test, I kept the taxa grouping for the conterminator command simple, just comparing between Archaea (2), Bacteria (3), and Eukaryota (4), not taking Viruses (5) into account: conterminator dna library.fna map.txt conterminator_out tmp --mask-lower-case 1 --ncbi-tax-dump "taxonomy/" --blacklist "5" --kingdoms "2,3,4"
(I also tried --kingdoms "(2||3),4"
, but got the same error).
Sorry for the late answer. I try to reran the example, it seems that conterminator
cannot find any conterminated sequence in your input. Because of this extracting the frame crashes. Do you expect that there is contamination in the sample?
I arbitrarily selected a few genomes to try out the program, so I have no idea (yet) if there is contamination or not. This selection was apparently not the best. I wanted to get a feeling for run time and computational requirements. I will try it again with a larger selection of genomes. Thanks for looking into this.
Ah I see. If you concat the example/dna.fas
to library.fna
and example/dna.mapping
to your map.txt
then you should be good to go.
Hi @martin-steinegger , after fixing the taxdump issue by compiling conterminator from source, I now run into the following error:
I checked for the files
contam_region_rev*
and while many of those existed, the filecontam_region_rev.dbtype
did not. I am not sure what went wrong. I was testing conterminator with a very small set of genomes (5 each viral (on blacklist), bacteria, archaea, eukaryotes). Attached is the full output. conterminator_log.txtThanks for your help!
Cheers, Christiane