hoelzer-lab / hypro

Extend hypothetical prokka protein annotations using additional homology searches against larger databases
GNU General Public License v3.0
9 stars 0 forks source link

Nextflow: mmseqs2 handling #31

Closed hoelzer closed 3 years ago

hoelzer commented 3 years ago

@EvaFriederike I am not 100% sure yet how to handle the mmseqs2 step.

Currently, everything happens in one process. Here, the mmseqs.sh script is called that does the indexing of the db and then performs the run.

I think bc/ nextflow generates a new tmp working dir every time the indexing is run again and again.

I suggest to separate this:

EvaFriederike commented 3 years ago

The mmseqs2 step is now split into one sub workflow and a process:

hoelzer commented 3 years ago

Why do we actually index the query and the target? :) Because we perform the mmseqs2 alignment in both directions?

EvaFriederike commented 3 years ago

In order for mmseqs2 to perform the search both the query and target FASTA files need to be converted into sequence databases.

hoelzer commented 3 years ago

Ah, thanks for the info! I had in mind that blast-like-style only on index db of the target is needed.