iqbal-lab-org / make_prg

Code to create a PRG from a Multiple Sequence Alignment file
Other
21 stars 7 forks source link

Version/fork confusion #32

Closed kdm9 closed 1 year ago

kdm9 commented 2 years ago

Hello folks

Thanks for what looks like a very promising method. I'm attempting to create a PanRG from aligned genes from orthofinder across a ~30 sample pangenome (i.e. 30 annotated genome assemblies), so I can use pandora to genotype variation in several hundred short read libraries. I'm following the toy example in @leoisl's fork as linked from another issue here, but the version of make_prg in conda doesn't support those CLI options. Additionally, it looks like the latest version in this repo (0.2.0) was released after version 0.3.0 from @leoisl's fork from July, and both forks seem to have ongoing development.

Which fork and version of make_prg should I use? Is there a guide or docs similar to @leoisl toy example for the latest code in this repo? And does this repo support building the PRG for a series of MSA at once, a la --input msas/ from @leoisl's code?

Best, Kevin

leoisl commented 2 years ago

Hello Kevin,

We are sorry for the confusion, indeed we have two versions/forks of make_prg. This one is a bit more stable, but lacks some features, and the one in https://github.com/leoisl/make_prg is more unstable but includes some new features like the update command, which was required for the pandora paper. We are in the process of refactoring a good part of the codebase, and making https://github.com/leoisl/make_prg stable by adding lots of unit and integration tests (this development is being done on https://github.com/leoisl/make_prg/tree/update). We are almost finishing it: by the end of this week we should have a prerelease on https://github.com/leoisl/make_prg with a new version, that fixes a lot of bugs, and add some changes. Then we will make a regression test to assess the impact of this new version on paper results, and if the results look good, we will PR https://github.com/leoisl/make_prg into this repo, and we will finally merge back the two repos into a single one.

Which fork and version of make_prg should I use?

If you don't mind waiting a little bit more (e.g. ~1 week or less), once the regression tests say that the new version is ok, I can send a link to the precompiled binary here. If you are a bit in a rush, at the end of this week we will have a prerelease with such binary available, but still without the regression tests analysis. Or if you want to use the make_prg version we used in the paper, please download it here: https://github.com/leoisl/make_prg/releases/download/v0.3.0/make_prg_0.3.0 , but be warned that there were bugs we discovered post paper that are fixed in the upcoming new version. Unfortunately, we won't be able to provide this new version in conda soon, just as a binary or pip installable. conda version will come just after the revision and the new code is merged into this repo.

Is there a guide or docs similar to @leoisl toy example for the latest code in this repo?

Unfortunately not, but that toy example will soon be merged back to this repo once the new version is ready.

And does this repo support building the PRG for a series of MSA at once, a la --input msas/ from @leoisl's code?

No, the code in this repo just builds a single MSA.

cheers

kdm9 commented 2 years ago

Thanks @leoisl! I'll wait a week or so then will keenly try out the new merged and tested version. Let me know if you would like a beta tester or any other help.

Best, Kevin

leoisl commented 2 years ago

Dear @kdm9 ,

sorry for my delay on the update. We found a bug after our regression test, during review of the code, that will delay the release to next week. We apologise for this delay.

Cheers

kdm9 commented 2 years ago

@leoisl No worries, much better delayed than buggy! Thanks, Kevin

leoisl commented 2 years ago

Dear @kdm9 ,

We are in the process of reviewing the code for this big update in make_prg. We broke the review process into 6 PRs, and 2 were already reviewed. Unfortunately, most of our group will go on break from next week, so these reviews will take longer. For now, I can offer you a pre-release of this version, which is somewhat stable in the sense that tests passed, but not yet fully reviewed by other members. You can download this pre-release version as a docker container by running docker pull leandroishilima/make_prg:1.0.0_pre_release or as a singularity container, if you prefer or don't have enough privileges to run docker, as singularity pull docker://leandroishilima/make_prg:1.0.0_pre_release. Alternatively, you can also install this using pip: pip install git+https://github.com/leoisl/make_prg@update_1_0_0_pre_release (MAFFT needs to be installed if you want to use make_prg update. With the containers, this dependency is already installed). As being basically a test user, please reply here in case of any issues, be it for installing or executing. These will be solved ASAP in order to have any bugs sorted when we merge this code back to the main fork/branch. I will have limited work time from 24th December until 10th January, but will do my best to reply you as soon as possible.

Cheers

kdm9 commented 2 years ago

Fantastic news @leoisl, thanks very much for making this available. I'll get it running over the weekend, and will let you know early next week if it seems to be functioning well.

Best, Kevin

kdm9 commented 2 years ago

Hello and Happy New Year folks!

@leoisl, I've sucessfully got make_prg running! There was just one bug I had to fix, which I notice you've actually already fixed in #35 (args.verbose wasn't defined due to how the parsers were set up). I was also going to sugggest you switch to the more standard --multi-word-arg format for long options rather than --multi_word_arg, but I note that you already have in #35. Thanks!

I'm now trying to use the resulting PRG file with pandora compare to genotype a larger set of short read libraries. I've found a crash though, but I'm not sure if this is something likely to be a pandora bug (in which case I'll make an issue there), or an issue due to the make_prg changes we are discussing here. Stdout/err below.

Best, kevin

[2022-01-05 15:55:11.329743] [0x00007f5e5441cf80] [info]    Constructing pangenome::Graph from read file tmp/omnidopsis-reads/OAKG5486.fastq.gz (this will take a while)
[2022-01-05 15:55:20.429034] [0x00007f5e3e4ee700] [info]    100000 reads processed...
[2022-01-05 15:55:32.447959] [0x00007f5e424f6700] [info]    200000 reads processed...
[2022-01-05 15:55:45.336833] [0x00007f5e3bce9700] [info]    300000 reads processed...
[2022-01-05 15:55:58.952446] [0x00007f5e344da700] [info]    400000 reads processed...
[2022-01-05 15:56:10.717633] [0x00007f5e3bce9700] [info]    500000 reads processed...
[2022-01-05 15:56:24.058770] [0x00007f5e364de700] [info]    600000 reads processed...
[2022-01-05 15:56:36.621030] [0x00007f5e4ed0f700] [info]    700000 reads processed...
[2022-01-05 15:56:48.118101] [0x00007f5e4f510700] [info]    800000 reads processed...
[2022-01-05 15:57:00.310676] [0x00007f5e37ce1700] [info]    900000 reads processed...
[2022-01-05 15:57:11.448873] [0x00007f5e42cf7700] [info]    1000000 reads processed...
[2022-01-05 15:57:22.070759] [0x00007f5e444fa700] [info]    1100000 reads processed...
[2022-01-05 15:57:34.679550] [0x00007f5e4d50c700] [info]    1200000 reads processed...
[2022-01-05 15:57:46.385129] [0x00007f5e49d05700] [info]    1300000 reads processed...
[2022-01-05 15:57:57.336647] [0x00007f5e3dced700] [info]    1400000 reads processed...
[2022-01-05 15:58:08.847885] [0x00007f5e47d01700] [info]    1500000 reads processed...
[2022-01-05 15:58:20.499238] [0x00007f5e464fe700] [info]    1600000 reads processed...
[2022-01-05 15:58:32.559429] [0x00007f5e3ace7700] [info]    1700000 reads processed...
[2022-01-05 15:58:44.161975] [0x00007f5e3c4ea700] [info]    1800000 reads processed...
[2022-01-05 15:58:54.958026] [0x00007f5e46cff700] [info]    1900000 reads processed...
[2022-01-05 15:59:06.496535] [0x00007f5e38ce3700] [info]    2000000 reads processed...
[2022-01-05 15:59:18.088091] [0x00007f5e3c4ea700] [info]    2100000 reads processed...
[2022-01-05 15:59:28.641331] [0x00007f5e35cdd700] [info]    2200000 reads processed...
[2022-01-05 15:59:38.957199] [0x00007f5e3bce9700] [info]    2300000 reads processed...
[2022-01-05 15:59:48.723881] [0x00007f5e35cdd700] [info]    2400000 reads processed...
[2022-01-05 15:59:59.328653] [0x00007f5e364de700] [info]    2500000 reads processed...
[2022-01-05 16:00:12.205278] [0x00007f5e4bd09700] [info]    2600000 reads processed...
[2022-01-05 16:00:22.156032] [0x00007f5e3a4e6700] [info]    2700000 reads processed...
[2022-01-05 16:00:32.168496] [0x00007f5e5441cf80] [info]    2800000 reads processed...
[2022-01-05 18:18:51.624493] [0x00007f5e5441cf80] [info]    Update LocalPRGs with hits
[2022-01-05 18:18:54.051961] [0x00007f5e5441cf80] [info]    Estimate parameters for kmer graph model
[2022-01-05 18:18:54.051996] [0x00007f5e5441cf80] [info]    Collect kmer coverage distribution
[2022-01-05 18:18:54.068297] [0x00007f5e5441cf80] [info]    Writing kmer coverage distribution to "/tmp/global2/kmurray/difflines-km/genes-and-pangenome/tmp/pandora/OAKG6094/kmer_covgs.txt"
[2022-01-05 18:18:54.091593] [0x00007f5e5441cf80] [info]    Collect kmer probability distribution
[2022-01-05 18:18:56.719813] [0x00007f5e5441cf80] [info]    Writing kmer probability distribution to "/tmp/global2/kmurray/difflines-km/genes-and-pangenome/tmp/pandora/OAKG6094/kmer_probs.txt"
[2022-01-05 18:18:56.722234] [0x00007f5e5441cf80] [info]    Estimated threshold for true kmers is -9
[2022-01-05 18:18:56.722322] [0x00007f5e5441cf80] [info]    Find max likelihood PRG paths
[2022-01-05 18:19:01.710211] [0x00007f5e5441cf80] [warning] Node OG0002593.213 has no reads
terminate called after throwing an instance of 'FatalRuntimeError'
  what():  [FATAL ERROR]: Error copying coverages to kmer graphs: reference node does not exist in pangraph
Aborting...

DEBUG info (for developers, provide this if opening an issue):
Stack trace (most recent call last):
#7    Object "pandora", at 0x55ad4a439170, in
#6    Object "/lib/x86_64-linux-gnu/libc.so.6", at 0x7f5e53621bf6, in __libc_start_main
#5    Object "pandora", at 0x55ad4a437830, in
#4    Object "pandora", at 0x55ad4a5054ef, in
#3    Object "pandora", at 0x55ad4a4f8a90, in
#2    Object "pandora", at 0x55ad4a46569b, in
#1    Object "pandora", at 0x55ad4a51d406, in
#0    Object "pandora", at 0x55ad4a4b611a, in

Stack trace (most recent call last):
#13   Object "pandora", at 0x55ad4a439170, in
#12   Object "/lib/x86_64-linux-gnu/libc.so.6", at 0x7f5e53621bf6, in __libc_start_main
#11   Object "pandora", at 0x55ad4a437830, in
#10   Object "pandora", at 0x55ad4a5054ef, in
#9    Object "pandora", at 0x55ad4a4f8a90, in
#8    Object "pandora", at 0x55ad4a46569b, in
#7    Object "pandora", at 0x55ad4a51d406, in
#6    Object "pandora", at 0x55ad4a4b6615, in
#5    Object "/tmp/global2/kmurray/conda/envs/dl20_genes/bin/../lib/libstdc++.so.6", at 0x7f5e544e47b2, in __cxa_throw
#4    Object "/tmp/global2/kmurray/conda/envs/dl20_genes/bin/../lib/libstdc++.so.6", at 0x7f5e544e45bd, in std::terminate()
#3    Object "/tmp/global2/kmurray/conda/envs/dl20_genes/bin/../lib/libstdc++.so.6", at 0x7f5e544e456b, in
#2    Object "/tmp/global2/kmurray/conda/envs/dl20_genes/bin/../lib/libstdc++.so.6", at 0x7f5e544e5fab, in __gnu_cxx::__verbose_terminate_handler()
#1    Object "/lib/x86_64-linux-gnu/libc.so.6", at 0x7f5e53640920, in abort
#0    Object "/lib/x86_64-linux-gnu/libc.so.6", at 0x7f5e5363efb7, in gsignal
Aborted (Signal sent by tkill() 127236 7593)
[1]    127236 abort      pandora compare -t 64 --illumina --clean --genotype --outdir tmp/pandora
leoisl commented 2 years ago

Dear Kevin,

Thanks for the patience of using this pre-release version of make_prg. Indeed, the CLI module was the one I invested less time and effort, but my colleagues pointed several issues that needed to be fixed and they were! Now we just need their reviews to merge these fixes.

Now about the error, I think it might have been caused by the use of the --clean option, from what I can see from the log messages. This option I think is a bit unstable: we evolved several modules of pandora in the last 3 years, but we haven't touched the --clean option, and I haven't used or evaluated pandora with this option. I think it was tested/used in the past, but I can't guarantee you it works in the current version. Could you please try to rerun the pandora compare command without the --clean parameter and with a bit more verbosity (add -vv to the command), and report back if you still got issues or not? Then it might be a real bug we haven't encountered yet, or a CLI option we have to deprecate until it is stable.

Thanks a lot for your help and testing. Cheers

kdm9 commented 2 years ago

Hi @leoisl

I can confirm that removing --clean fixed the above issue with Pandora. I think we can close this issue now, unless you want it to stay open so other folks can easily find which version of which fork you recommend.

Best, Kevin

leoisl commented 2 years ago

Hello,

Yes, great that it worked out! Will keep this issue open until we merge both forks!

Cheers

leoisl commented 1 year ago

Forks merged, new version available at https://github.com/iqbal-lab-org/make_prg/releases/tag/0.4.0