jotech / gapseq

Informed prediction and analysis of bacterial metabolic pathways and genome-scale networks
GNU General Public License v3.0
161 stars 33 forks source link

Diamond not used for the transport-find command #202

Closed Porthmeus closed 11 months ago

Porthmeus commented 11 months ago

Hi Jo, Silvio, I just noticed that gapseq is recognizing .faa files and switches to diamond for the find command, but not for the find-transport one. Was that a deliberate decision or did you simply not implement it yet?

If you need help let me know.

BTW: @unaimed and a student of mine are currently implementing a snakemake pipeline in order to run it more efficiently on HPC systems and make it more portable.

Waschina commented 11 months ago

Hi Jan,

diamond is also not used in find: https://github.com/jotech/gapseq/blob/db2cd4dd7534f19b863387c0953483183272af4e/src/gapseq_find.sh#L528-L531

Note the hash before the diamond call.

We once considered diamond as an alternative, ran some tests, and noticed that diamond wouldn't improve runtime in how gapseq performs the searches, at least without being too invasive in the rest of the code.

Concerning the snakemake pipeline: Have a look at https://github.com/Waschina/gapsnake

Porthmeus commented 11 months ago

Cool thanks - than I misunderstood Johannes and just failed to notice the usage of blastp in the find command.

I just looked through the code and I can see now why that is. For diamond to improve speed, you would need to concatenate the different query sequence files into one query - that would speed up things a lot. But it would also require a lot of reshaping the original code.

Maybe, we can organize a small hackathon on the topic at some point. Than I would gladly help out implementing it. But I will close the issue for now.