Closed erelior closed 4 years ago
Hi @erelior , thanks for reaching out and providing the files. A had a short look into it and apparently nucmer
returns with an empty output. So I took a deep dive into your data and noticed that the genomic content in your NC_000117.1.fna
file is duplicated:
grep '>' NC_000117.1.fna
>NC_000117.1 Chlamydia trachomatis D/UW-3/CX chromosome, complete genome
>NC_000117.1 Chlamydia trachomatis D/UW-3/CX chromosome, complete genome
After cleaning your genome, nucmer
is able to align its sequence as expected.
Could you please check all your db fasta files for these duplicated records, fix them and rebuild the database? Please, let me know if this helps.
Worked like a charm! @oschwengers thank you so much!
Thanks for the feedback. You're welcome!
Hi! I created a custom bacteria database and imported some fasta files in it, which seemed to work: referenceseeker_db init --db DB referenceseeker_db import --db DB --genome "$file" --status complete --organism "$organism" -t $taxid (got a Successfully imported genome message for all)
but running referenceseeker for files (including ones contained in the DB) came back empty: referenceseeker -v DB e_coli.fna returned:
ID Mash Distance ANI Con. DNA Taxonomy ID Assembly Status Organism
eliminating ani and conv. thresholds returned mash distance, but ani and conv.dna returned 0: referenceseeker -v -a 0 -c 0 DB e_coli.fna
ID Mash Distance ANI Con. DNA Taxonomy ID Assembly Status Organism
NC_002695.2 0.01899 0.00 0.00 83334 complete Escerichia coli (changing crg value gave the same result)
I can't figure out why ani and conv.DNA values returns 0 for fasta files I know are identical/similar to reference genomes.
I am using a BioConda installation, version 1.6
(I attached a fasta file from the DB I used and the db files) referenceseeker_issue.zip
Thanks