rega-cev / virulign

VIRULIGN: fast codon-correct alignment and annotation of viral genomes
GNU General Public License v2.0
31 stars 12 forks source link

Parallelize virulign #20

Closed mirand863 closed 2 years ago

mirand863 commented 3 years ago

This PR parallelizes Virulign with OpenMP and adds a --version parameter (version is extracted from git tag).

mirand863 commented 3 years ago
mirand863 commented 3 years ago

@plibin-vub I am wondering if it would make more sense to write the sequences names to a file for logging and display only the progress bar with something like image Otherwise it gets really messy with multiple threads printing the IDs at the same time + progress bar. What are your thoughts?

plibin commented 3 years ago

Hi Miranda,

I prefer to use the Unix philosophy, to use standard out and error, such that the tools can be chained to other command line tools.

There already is code now to show some kind of progress bar, see the code in “if (progress) {“. This code should be changed, when dealing with parallel computations, but the gist of it should still be the same I believe.

An option could be to log the sequences only when the progress bar is disabled (progress==false).

greets,

Pieter

On 4 Mar 2021, at 20:36, mirand863 notifications@github.com<mailto:notifications@github.com> wrote:

@plibin-vubhttps://github.com/plibin-vub I am wondering if it would make more sense to write the sequences to a file for logging and display only the progress bar with something like [image]https://user-images.githubusercontent.com/22525107/110019908-41712f80-7d29-11eb-8e42-e781257a4ab4.png Otherwise it gets really messy with multiple threads. What are your thoughts?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/rega-cev/virulign/pull/20#issuecomment-790876179, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABJOXAQX4IFYCANQHEO7K5TTB7ONRANCNFSM4YRSGEUA.

[ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/rega-cev/virulign/pull/20#issuecomment-790876179", "url": "https://github.com/rega-cev/virulign/pull/20#issuecomment-790876179", "name": "View Pull Request" }, "description": "View this Pull Request on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

mirand863 commented 3 years ago

Edit: I got it to work without the color, which is good enough for me

image

So I added a progress bar from a library called indicators that is already thread safe. However, there is a problem... It is supposed to look like this:

image

But instead it looks like this:

image

I think that virulign does something to the std::cout and std::cerr that is making this library behave this way, because the progress bar doesn't even show at all on std::cout (which is its default). Any clues about how to fix this?

mirand863 commented 3 years ago

@plibin-vub The PR is ready for review. I finished adding the parameter to set the number of threads, fixed the progress bar to work in parallel with the library indicators, the results vector is thread safe as well and OpenMP is disabled if not found. I think it is all good. Can you please review and merge? Otherwise, please let me know if anything else needs to be done before merging this PR.

mirand863 commented 3 years ago

Just a quick update... Today I created a bioconda package for virulign with the current version 1.0.2 https://anaconda.org/bioconda/virulign. If you accept this merge request and create a new tag with some release note it should update automatically on bioconda too :) Looking forward to be able to use the parallel version in the bioinformatics pipeline I am developing to track sars-cov-2 mutations. Virulign is very useful!

plibin commented 3 years ago

Hi Miranda,

sorry for the late reply, things have been quite busy.

Thanks for your work, and thanks for creating this bioconda package. I will have a look at it, and reference it on the virulign GitHub page.

For your changes with respect to the parallel virulign, how can I look at all the commits you’ve made?

I have to say, I’m a bit doubtful of including an MIT-licensed project in our otherwise GPL licensed code base. Perhaps it should be possible to opt out of this library, with some kind of compile option?

Thanks and kind regards,

Pieter

On 11 Mar 2021, at 22:14, mirand863 @.**@.>> wrote:

Just a quick update... Today I created a bioconda package for virulign with the current version 1.0.2 https://anaconda.org/bioconda/virulign. If you accept this merge request and create a new tag with some release note it should update automatically on bioconda too :)

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/rega-cev/virulign/pull/20#issuecomment-797053391, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABJOXAS7U6BZVBR53YMWGA3TDEXEXANCNFSM4YRSGEUA.

[ { @.": "http://schema.org", @.": "EmailMessage", "potentialAction": { @.": "ViewAction", "target": "https://github.com/rega-cev/virulign/pull/20#issuecomment-797053391", "url": "https://github.com/rega-cev/virulign/pull/20#issuecomment-797053391", "name": "View Pull Request" }, "description": "View this Pull Request on GitHub", "publisher": { @.": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

mirand863 commented 3 years ago

Hi Miranda, sorry for the late reply, things have been quite busy. Thanks for your work, and thanks for creating this bioconda package. I will have a look at it, and reference it on the virulign GitHub page. For your changes with respect to the parallel virulign, how can I look at all the commits you’ve made? I have to say, I’m a bit doubtful of including an MIT-licensed project in our otherwise GPL licensed code base. Perhaps it should be possible to opt out of this library, with some kind of compile option? Thanks and kind regards, Pieter On 11 Mar 2021, at 22:14, mirand863 @.**@.>> wrote: Just a quick update... Today I created a bioconda package for virulign with the current version 1.0.2 https://anaconda.org/bioconda/virulign. If you accept this merge request and create a new tag with some release note it should update automatically on bioconda too :) — You are receiving this because you commented. Reply to this email directly, view it on GitHub<#20 (comment)>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABJOXAS7U6BZVBR53YMWGA3TDEXEXANCNFSM4YRSGEUA. [ { @.": "http://schema.org", @.": "EmailMessage", "potentialAction": { @.": "ViewAction", "target": "#20 (comment)", "url": "#20 (comment)", "name": "View Pull Request" }, "description": "View this Pull Request on GitHub", "publisher": { @.": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

Hello Pieter,

No problems. I imagined you were busy.

I think the easier way to review the code is to click on the tab "files changed" in this pull request page to easily see the modifications that would be applied to the repository. Perhaps you can also try to run it locally after cloning my fork with git clone git@github.com:mirand863/virulign.git and changing to this branch with git checkout --track origin/parallelize.

If you want I can completely remove the progress bar library and leave the previous code, but it becomes unreadable when many threads are printing at the same time. I can also add the compiler parameter. It is up to you, but as far as I know the MIT license is compatible with GPL projects. https://www.tawesoft.co.uk/kb/article/mit-license-faq

Can I use MIT-licensed code in my GPL-licensed project?

Yes. The MIT License is GPL compatible.

It would only be incompatible the other way around.

Kind Regards, Fabio