Open felipevzps opened 4 years ago
We currently predict contamination just for shore sequences of length < 20kb. The 20kb can be in scaffolds or just single sequences. I assume you have just one long sequence?
@martin-steinegger Is there a way to indicate that contamination should be reported for longer sequences? I'm trying to reproduce the example between C. elegans and E. coli in your ms.
The _all
report should contain all the local alignments with cross kingdom hits (--kingdom). This could be used to filter for longer sequences. Can you find the C.elegans and E.coli in it? The format is like the following:
1.) Numeric identifier
2.) Sequence identifier
3.) Alignment start
4.) Alignment end
5.) Corrected contig length (length between flanking Ns)
6.) Total sequence length
7.) Kingdom (default: 0: Bacteria&Archaea, 1: Fungi, 2: Metazoa, 3: Viridiplantae, 4: Other Eukaryotes)
8.) Species name
There are indeed expected hits in the _all
file. Is it possible to make the 20 kb filtering criterion an exposed parameter? This would also help document to users that such a criterion exists.
Yes, I agree. I had this on my todo list for quite some time. :( But currently I am quite flooded with work.
Hello!
I did a synthetic genome to check the outputs and the conterminator failed to predict inserted contaminants.
Infos: Version: 1.c74b5 Organisms in this synthetic genome: Saccharum hybrid cultivar SP80-3280, Klebsiella pneumoniae and Acinetobacter baumannii.
History I inserted the complete A.baumanii and K.pneumoniae genome into the sugarcane genome and created a kraken mapping file (when I checked the mapping file, I could see the ID taxonomy of the inserted items - A.baumani ID = 470, K.pneumoniae ID = 573 and SP80-3280 ID = 193079).
Then, I ran the conterminator with the following command:
conterminator dna synthetic_genome.fasta kraken_mapping_file.txt synthetic_genome_conterminator tmp
Results The synthetic_genome_conterminator_conterm_prediction is empty. The synthetic_genome_conterminator_all don't have informations of the inserted contaminants.
Data synthetic_genome_conterminator_all.txt kraken_mapping_file.txt Genome file is too big and the conterm_prediction is empty.
Problem My objective is to observe contamination in the sugarcane genome. I'm using the conterminator incorrectly or is the conterminator failing to predict contamination?