Closed iaposto closed 3 years ago
The genome filter is meant to be more restrictive in the application by default and looks for genomes within a mash distance of 0.002. You can increase this to be something like 0.05 if you want to be more permissive. The fasta file included there shouldn't be there and it is not needed for the tool to run. The blast database is all that is needed and if you want the original fasta files you can regenerate it from the blast indexes. I will update the archive to remove the erroneous fasta file.
Thanks for your reply, if the fasta file is not necessary to run the tool then to which file do I have to point the -g flag for the tool to use the prebuilt database? There are 7 .nsq files
Sorry for the late reply. The -g flag would just need to include the path and prefix to the databases. So the command you listed above is all that is needed
Hello,
I downloaded the prebuilt database of closed Enterobacteriaceae genomes to use with the --genome_filter_db_prefix (-g) flag for reconstructing E. coli plasmids. As per the example in README.md I used "-g /2019-11-NCBI-Enterobacteriacea-Chromosomes/2019-11-NCBI-Enterobacteriacea-Chromosomes.fasta" in my script. In the program's output I get: "Genome filter sequences provided" followed by "No close genome matches found". My questions are:
Thanks in advance, Ilias