Closed Porthmeus closed 11 months ago
Hi Jan,
diamond is also not used in find
:
https://github.com/jotech/gapseq/blob/db2cd4dd7534f19b863387c0953483183272af4e/src/gapseq_find.sh#L528-L531
Note the hash before the diamond call.
We once considered diamond as an alternative, ran some tests, and noticed that diamond wouldn't improve runtime in how gapseq performs the searches, at least without being too invasive in the rest of the code.
Concerning the snakemake pipeline: Have a look at https://github.com/Waschina/gapsnake
Cool thanks - than I misunderstood Johannes and just failed to notice the usage of blastp in the find
command.
I just looked through the code and I can see now why that is. For diamond to improve speed, you would need to concatenate the different query sequence files into one query - that would speed up things a lot. But it would also require a lot of reshaping the original code.
Maybe, we can organize a small hackathon on the topic at some point. Than I would gladly help out implementing it. But I will close the issue for now.
Hi Jo, Silvio, I just noticed that gapseq is recognizing .faa files and switches to diamond for the
find
command, but not for thefind-transport
one. Was that a deliberate decision or did you simply not implement it yet?If you need help let me know.
BTW: @unaimed and a student of mine are currently implementing a snakemake pipeline in order to run it more efficiently on HPC systems and make it more portable.