steineggerlab / conterminator

Detection of incorrectly labeled sequences across kingdoms
GNU General Public License v3.0
77 stars 7 forks source link

Error: Extractframes died #12

Closed chassenr closed 3 years ago

chassenr commented 3 years ago

Hi @martin-steinegger , after fixing the taxdump issue by compiling conterminator from source, I now run into the following error:

World Size: 32 dbSize: 0
World Size: 32 dbSize: 0
World Size: 32 dbSize: 0
World Size: 32 dbSize: 0
31mWorld Size: 32 dbSize: 0
Segmentation fault (core dumped)
Error: Extractframes died

I checked for the files contam_region_rev* and while many of those existed, the file contam_region_rev.dbtype did not. I am not sure what went wrong. I was testing conterminator with a very small set of genomes (5 each viral (on blacklist), bacteria, archaea, eukaryotes). Attached is the full output. conterminator_log.txt

Thanks for your help!

Cheers, Christiane

martin-steinegger commented 3 years ago

Great that you could get the custom taxonomy running. Thank you for reporting this new issue. Can you upload the example?

chassenr commented 3 years ago

I am using files that are about 500MB for the test that I mentioned. I would put them on a cloud and share the link here, or is there a better option to share files of that size?

chassenr commented 3 years ago

Here is the nextcloud link: https://cloud.marum.de/s/F4yJBnjdQ7grEad. I included in the input fasta (library.fna), the mapping files (map.txt), the taxdump folder (taxonomy), and the temporary data generated by conterminator (tmp). The input files are formatted to be used with kraken2 to build a databases, i.e. I am planning to run conterminator just before the kraken2-build command to identify contaminated contigs. I hope that the kraken2 requirements for the fasta header are not causing the error. As this is only a test, I kept the taxa grouping for the conterminator command simple, just comparing between Archaea (2), Bacteria (3), and Eukaryota (4), not taking Viruses (5) into account: conterminator dna library.fna map.txt conterminator_out tmp --mask-lower-case 1 --ncbi-tax-dump "taxonomy/" --blacklist "5" --kingdoms "2,3,4" (I also tried --kingdoms "(2||3),4", but got the same error).

martin-steinegger commented 3 years ago

Sorry for the late answer. I try to reran the example, it seems that conterminator cannot find any conterminated sequence in your input. Because of this extracting the frame crashes. Do you expect that there is contamination in the sample?

chassenr commented 3 years ago

I arbitrarily selected a few genomes to try out the program, so I have no idea (yet) if there is contamination or not. This selection was apparently not the best. I wanted to get a feeling for run time and computational requirements. I will try it again with a larger selection of genomes. Thanks for looking into this.

martin-steinegger commented 3 years ago

Ah I see. If you concat the example/dna.fas to library.fna and example/dna.mapping to your map.txt then you should be good to go.