lh3 / pangene

Constructing a pangenome gene graph
170 stars 8 forks source link

Empty pangraph.gfa output #2

Closed bioteksampath closed 1 year ago

bioteksampath commented 1 year ago

Hi Li, Thnaks for this awesome and super fast tool. I run the pangene tool for my 50 plant genome ( allotetraploid genomes), i could create the .paf file. However, i found empty output on the pangraph.gfa

Any help will be highly appreciated,

---summary of my run.... check the bold letters

[M::pg_post_process::30.932*0.91] genome 49: 8490 pseudo, 149337 shadow [M::pg_post_process::30.996*0.91] genome 50: 7639 pseudo, 111360 shadow* [M::pg_gen_vtx::36.3180.92] selected 0 vertices* out of 7646947 genes [M::pg_graph_gen::38.6410.93] round-1 graph: 0 genes and 0 arcs [M::pg_graph_flt_high_occ::38.6410.93] 0 high-occurrence segments [M::pg_graph_flt_high_occ::38.6410.93] 0 high-degree segments additionally [M::pg_graph_gen::40.9060.93] round-2 graph: 0 genes and 0 arcs [M::pg_mark_branch_flt_arc::40.9060.93] marked 0 diverged branches [M::pg_mark_branch_flt_hit::42.7860.93] marked 0 diverged hits [M::pg_mark_branch_flt_arc::44.8210.94] marked 0 diverged branches [M::pg_mark_branch_flt_hit::46.6930.94] marked 0 diverged hits [M::pg_mark_branch_flt_arc::48.7230.94] marked 0 diverged branches [M::pg_mark_branch_flt_hit::50.5930.94] marked 0 diverged hits [M::pg_graph_cut_low_arc::52.6210.95] filtered 0 low-occurrence arcs [M::pg_graph_gen::52.621*0.95] round-3 graph: 0 genes and 0 arcs [M::main] Version: 0.0-r87-dirty

Below is my command.

`pangene -a2 NAM00.paf NAM01_pan.paf NAM04_pan.paf NAM05_pan.paf NAM08_pan.paf NAM10_pan.paf NAM12_pan.paf NAM13_pan.paf NAM14_pan.paf NAM15_pan.paf NAM17_pan.paf NAM23_pan.paf NAM25_pan.paf NAM26_pan.paf NAM28_pan.paf NAM29_pan.paf NAM30_pan.paf NAM31_pan.paf NAM32_pan.paf NAM33_pan.paf NAM34_pan.paf NAM36_pan.paf NAM37_pan.paf NAM38_pan.paf NAM39_pan.paf NAM40_pan.paf NAM42_pan.paf NAM43_pan.paf NAM45_pan.paf NAM46_pan.paf NAM47_pan.paf NAM51_pan.paf NAM53_pan.paf NAM56_pan.paf NAM57_pan.paf NAM65_pan.paf NAM66_pan.paf NAM68_pan.paf NAM71_pan.paf NAM72_pan.paf NAM73_pan.paf NAM75_pan.paf NAM76_pan.paf NAM78_pan.paf NAM79_pan.paf NAM82_pan.paf NAM83_pan.paf NAM85_pan.paf NAM86_pan.paf NAM87_pan.paf NAM88_pan.paf

[M::main] Real time: 58.202 sec; CPU: 55.320 sec; Peak RSS: 4.586 GB`

Thanks sam

lh3 commented 1 year ago

Could you send me two alignment files for debugging? Thanks

bioteksampath commented 1 year ago

Hi Li, I found it seems works when i give 20 .paf files or less but when I increase to 30, it provides empty output. Here I'm sending couple of file due to size limitation (25 Mb), but i can share all 50 files if you email me id.

(https://github.com/lh3/pangene/files/12017017/2pafs.zip)

Thanks a lot, Sam

lh3 commented 1 year ago

Thanks for the example. I see what is happening. When running miniprot, you need to align the same set of proteins against all genomes. You seem to have one protein set for each genome. Pangene would not work in this case. You may use the gene annotation from the reference genome, or use CD-HIT or mmseqs2 to cluster proteins from all 50 genomes.

bioteksampath commented 1 year ago

Thanks a lot, it worked. I used a common reference map all the 50 genomes.