pangenome / pggb

the pangenome graph builder
https://doi.org/10.1038/s41592-024-02430-3
MIT License
394 stars 44 forks source link

Not completed in overlap_collect step #425

Open kjyunm opened 1 week ago

kjyunm commented 1 week ago

Hello, I am trying to build a pangenome using pig data from multiple breeds. The input data is 4.5GB when compressed, and I am using the Docker image ghcr.io/pangenome/pggb:latest.

I ran the following command docker run -it -v ${PWD}:/data -u $(id -u):$(id -g) ghcr.io/pangenome/pggb:latest pggb -i /data/pigs.assembly.fa.gz -o /data/out -t 20

However, during the execution, it gets stuck at the step [seqwish::transclosure] 25082.955 81.34% 12280027975-12290027975 overlap_collect and has not progressed for over a week.

Could you please advise if there are any additional options or preprocessing steps needed to resolve this issue? I would appreciate any suggestions on the cause of this hanging and how to resolve it.

Thank you!

AndreaGuarracino commented 1 day ago

@kjyunm maybe you caught a seqwish bug? You could try to increase -k/--min-match-len in pggb (23 by default) to make seqwish's life a bit easier