Investigate results of blast searches against multiple databases

mhoban / rainbow_bridge

GNU General Public License v3.0

5 stars 2 forks source link

Investigate results of blast searches against multiple databases #63

Closed mhoban closed 6 months ago

mhoban commented 6 months ago

Since you can specify multiple blast databases, what do the results look like if you do? Do they overlap in some way? Should they be run as separate blast processes?

Actually, running them as separate processes might be better because you could potentially avoid name collisions.

Right now for example if your main blast db is in /drives/blast/nt and you pass something that's also in a directory called 'blast' (no matter where that directory is), you'll get a name collision error.

mhoban commented 6 months ago

Currently, when you run a query against multiple blast databases, results from all databases end up in the same results table. There's no definite indicator of where particular results came from.

mhoban commented 6 months ago

Should we move blast searches into separate processes and then merge the results afterward? Does it make a difference? Maybe this should be tested.

e.g., something like

Channel.fromPath([params.blastDb]) | 
  blast |
  collectFile(name: 'blast_results_merged.tsv', storeDir: 'blast')

mhoban commented 6 months ago

Ok, I updated the blast system. Instead of using system $BLASTDB to infer the location of the nt database, we use $FLOW_BLAST to specify the full location of any database and use that as a default.

Now, blast searches are run against each supplied database separately and combined the searches are complete.

There is also now a way to specify taxdb files if you want to, but it gets them from the internet if you don't and they're missing.