Closed mhoban closed 6 months ago
Currently, when you run a query against multiple blast databases, results from all databases end up in the same results table. There's no definite indicator of where particular results came from.
Should we move blast searches into separate processes and then merge the results afterward? Does it make a difference? Maybe this should be tested.
e.g., something like
Channel.fromPath([params.blastDb]) |
blast |
collectFile(name: 'blast_results_merged.tsv', storeDir: 'blast')
Ok, I updated the blast system. Instead of using system $BLASTDB
to infer the location of the nt
database, we use $FLOW_BLAST
to specify the full location of any database and use that as a default.
Now, blast searches are run against each supplied database separately and combined the searches are complete.
There is also now a way to specify taxdb files if you want to, but it gets them from the internet if you don't and they're missing.
Since you can specify multiple blast databases, what do the results look like if you do? Do they overlap in some way? Should they be run as separate blast processes?
Actually, running them as separate processes might be better because you could potentially avoid name collisions.
Right now for example if your main blast db is in /drives/blast/nt and you pass something that's also in a directory called 'blast' (no matter where that directory is), you'll get a name collision error.