nf-core / pangenome

Renders a collection of sequences into a pangenome graph. https://doi.org/10.1093/bioinformatics/btae609.
https://nf-co.re/pangenome
MIT License
72 stars 16 forks source link

GFA to rGFA problem #206

Open mictadlo opened 2 months ago

mictadlo commented 2 months ago

Description of the bug

Hi, PanGraphViewer failed to convert your pipeline's GFA file to rGFA.

By any chance do you know what the problem could be?

Best wishes,

Michal

Command used and terminal output

No response

Relevant files

No response

System information

No response

mictadlo commented 2 months ago

I failed with gfa2rGFA.py too.

subwaystation commented 2 months ago

Converting a GFA to rGFA one does need to specify at least one path as the reference path. The python script you linked to does not work this way.

Maybe vg convert is of help here? I am not sure, if Going from this pipeline's GFA to rGFA is even possible. Because we always have to select one path as the reference. So the conversion process may not work at all or you will lose alignments between paths that are not directly related to the reference.

mictadlo commented 2 months ago

Thank you, for your reply, I found another tool which might do it but there are many parameters which can be set. Unfortunately, I don't know much about GFA to properly choose the correct parameters.

> conda create -n VRPG Django==3.2.4  pybind11
> conda activate VRPG
> git clone https://github.com/codeatcg/VRPG --recursive
Cloning into 'VRPG'...
remote: Enumerating objects: 523, done.
remote: Counting objects: 100% (186/186), done.
remote: Compressing objects: 100% (147/147), done.
remote: Total 523 (delta 109), reused 38 (delta 28), pack-reused 337 (from 1)
Receiving objects: 100% (523/523), 2.00 MiB | 3.69 MiB/s, done.
Resolving deltas: 100% (252/252), done.
> cd VRPG/module
/VRPG/module> make
g++ -O3 -Wall -std=c++11 -pthread -c gfa2view.cpp -o gfa2view.o -lz
g++ -O3 -Wall -std=c++11 -pthread -c minipg.cpp -o minipg.o -lz
g++ -O3 -Wall -std=c++11 -pthread -c gz.cpp -o gz.o -lz
g++ -O3 -Wall -std=c++11 -pthread gfa2view.o minipg.o gz.o -o gfa2view -lz
g++ -O3 -Wall -std=c++11 -pthread -c anno.cpp -o anno.o -lz
g++ -O3 -Wall -std=c++11 -pthread -c refgene.cpp -o refgene.o -lz
g++ -O3 -Wall -std=c++11 -pthread -c ghAnno.cpp -o ghAnno.o -lz
g++ -O3 -Wall -std=c++11 -pthread anno.o refgene.o ghAnno.o gz.o -o GraphAnno -lz
g++ -D PYMODULE -O3 -Wall -std=c++11 -pthread -shared -fPIC -I/work/waterhouse_team/miniconda2/envs/VRPG/include/python3.12 -I/work/waterhouse_team/miniconda2/envs/VRPG/lib/python3.12/site-packages/pybind11/include minipg.cpp  -o minipg.cpython-312-x86_64-linux-gnu.so
VRPG/module> ./gfa2view --help
Usage: gfa2view --GFA input.gfa --index --ref REF#HAP --outDir output_dir
--sep     <String>   Delimiter between sample and haplotype names, by default: #
--GFA     <File>     Input GFA file
--ref     <String>   Reference name (sample_name + delimiter + haplotype)
--refChr  <File>     When indexing the graph only consider reference chromosomes or contigs contained in this file (one chromosome or contig per line).
--outDir  <Dir>      Output directory
--index              Index the graph for rapid access
--xDep    <Int>      Search depth when creating graph indexes, by default: 10
--range   <Int>      Number of reference nodes in a chunk, which is used for indexing the graph, by default: 2000
--cross              There are crosses between reference chromosomes. It will take more running time.
--thread  <Int>      Number of threads.
subwaystation commented 1 month ago

Just follow the usage line. I never used VRPG, so I can't really help you I guess. In my mind, you can go rGFA -> GFA, but vice-versa I don't know.

subwaystation commented 1 month ago

./gfa2view --GFA ~/git/odgi/test/chr6.C4.gfa --ref "NA21309#1" --outDir ~/Desktop/TEST_VRPG

I got some output with this. I assume you have to directly plug it into VRPG. It won't give you a real rGFA output.