labgem / PPanGGOLiN

Build a partitioned pangenome graph from microbial genomes
https://ppanggolin.readthedocs.io
Other
233 stars 26 forks source link

Rarefaction #195

Open eduardags12 opened 5 months ago

eduardags12 commented 5 months ago

I'm having a problem when I run rarefaction, it never finishes, it stays at 2679/2680.

axbazin commented 5 months ago

Hi,

Could you share the log of your analysis ? And, if possible, the genomes and command line you used to reach this problem (if it is not possible it's perfectly understandable).

Adelme

eduardags12 commented 5 months ago

Thank you very much for your feedback,

I'll send you the code I used to generate the rarefaction, as well as a printout of what my screen looks like without finalizing it.

Code: find $PWD/gff/ -name '*.gff' > gff_list_pangenome.ts cat gff_list_pangenome.tsv | parallel echo -e "{/.}'\t'{}" | sponge gff_list_pangenome.tsv conda activate ppanggolin ppanggolin workflow --anno gff_list_pangenome.tsv -o 11_ppanggolin --cpu $(nproc) -f ppanggolin info -p 11_ppanggolin/pangenome.h5 --content > 11_ppanggolin/content.txt ppanggolin info -p 11_ppanggolin/pangenome.h5 --parameters > 11_ppanggolin/parameters.txt ppanggolin rarefaction --cpu $(nproc) -p 11_ppanggolin/pangenome.h5 -o 11_ppanggolin/rarefaction

Att, Eduarda

Em seg., 18 de mar. de 2024 às 12:45, Adelme Bazin @.***> escreveu:

Hi,

Could you share the log of your analysis ? And, if possible, the genomes and command line you used to reach this problem (if it is not possible it's perfectly understandable).

Adelme

— Reply to this email directly, view it on GitHub https://github.com/labgem/PPanGGOLiN/issues/195#issuecomment-2004273664, or unsubscribe https://github.com/notifications/unsubscribe-auth/BBCB63EPTQXWEOBYMMVWY3DYY4D25AVCNFSM6AAAAABE3VOV22VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBUGI3TGNRWGQ . You are receiving this because you authored the thread.Message ID: @.***>

-- Eduarda Guimarães Sousa Mestranda em Genética: Genômica, Bioinformática Laboratory of Cellular and Molecular Genetics Institute of Biological Sciences Federal University of Minas Gerais (34) 996795082

axbazin commented 5 months ago

Hi,

Thank you for the command lines, they are used exaclty as expected, so nothing to change there and there is likely a bug for us to find. Would you mind sharing the log that was printed out when you ran those?

Adelme

eduardags12 commented 5 months ago

Hi, Thank you for your feedback, but I can´t find the log that was printed out.

Eduarda

Em ter., 2 de abr. de 2024 às 12:10, Adelme Bazin @.***> escreveu:

Hi,

Thank you for the command lines, they are used exaclty as expected, so nothing to change there and there is likely a bug for us to find. Would you mind sharing the log that was printed out when you ran those?

Adelme

— Reply to this email directly, view it on GitHub https://github.com/labgem/PPanGGOLiN/issues/195#issuecomment-2032317705, or unsubscribe https://github.com/notifications/unsubscribe-auth/BBCB63AEKKYBLFA57R5DDOLY3LC6ZAVCNFSM6AAAAABE3VOV22VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZSGMYTONZQGU . You are receiving this because you authored the thread.Message ID: @.***>

-- Eduarda Guimarães Sousa Mestranda em Genética: Genômica, Bioinformática Laboratory of Cellular and Molecular Genetics Institute of Biological Sciences Federal University of Minas Gerais (34) 996795082

axbazin commented 5 months ago

Hi,

I'm afraid I can't do much without more information, sorry. I'm not sure I remember meeting a case where the rarefaction was hanging. I assume it can come from NEM, maybe @ggautreau you remember if there is an edge case where this step can hang ?

Adelme

ggautreau commented 5 months ago

Hi both of you,

This should not result in an endless computation. Could you share your dataset @eduardags12 to help replicate this issue?

Thx.

eduardags12 commented 5 months ago

Sure, here it is.

https://drive.google.com/drive/folders/1xpHpLHo2TbAgYGOriyQr9rPMnEijfp48?usp=drive_link

Att, Eduarda

Em sex., 5 de abr. de 2024 às 12:59, Guillaume GAUTREAU < @.***> escreveu:

Hi both of you,

This should not result in an endless computation. Could you share your dataset @eduardags12 https://github.com/eduardags12 to help replicate this issue?

Thx.

— Reply to this email directly, view it on GitHub https://github.com/labgem/PPanGGOLiN/issues/195#issuecomment-2040162335, or unsubscribe https://github.com/notifications/unsubscribe-auth/BBCB63GZGL2AZUPPPRJLDR3Y33C4TAVCNFSM6AAAAABE3VOV22VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBQGE3DEMZTGU . You are receiving this because you were mentioned.Message ID: @.***>

-- Eduarda Guimarães Sousa Mestranda em Genética: Genômica, Bioinformática Laboratory of Cellular and Molecular Genetics Institute of Biological Sciences Federal University of Minas Gerais (34) 996795082

jpjarnoux commented 2 months ago

Hi @eduardags12 Sorry for the delay. What is the status of this issue?

jpjarnoux commented 2 months ago

Hi!

We were able to work on your data. We got stuck at the annotation stage because we couldn't manage the coordinates on several lines. This bug has been fixed in PR #240.

Having done this, I didn't have any problems with partitioning.

You can use the dev version of PPanGGOLiN, which includes this change. Here it's how to install it https://ppanggolin.readthedocs.io/en/latest/user/install.html#development-version

A release will be available soon for installation via conda.