pangenome / odgi

Optimized Dynamic Genome/Graph Implementation: understanding pangenome graphs
https://doi.org/10.1093/bioinformatics/btac308
MIT License
196 stars 40 forks source link

Remove Complex Region of the Graph Pangenome #572

Open zhangyixing3 opened 6 months ago

zhangyixing3 commented 6 months ago

Dear sir, Using Anchorwave-cactus, we have successfully constructed a graph-based pangenome from 97 samples. However, the centromeric region is proving to be quite confusing and is impeding the use of the 'vg autoindex' command. To address this issue, we intend to remove this particular region. The following image depicts two chromosomes. image

#odgi command
odgi prune -i  97sample.finally.og  -C  97   -t 10 -o joint-clip.P_C97.og
odgi view -i  joint-clip.P_C97.og -g  >  joint-clip.P_C97.gfa

The result joint-clip.P_C97.gfa have no path or Walk, is that normal?

# vg  clip
#ok 
vg clip  97sample.finally.vg  -P sample -n 97 -e 97  > clip.finally_n97_e97.vg  
vg convert -t 10 -f  clip.finally_n97_e97.vg > clip.finally_n97_e97.gfa
odgi build -g clip.finally_n97_e97.gfa  -o joint-clip.P.og -t 20  
odgi sort  -i  joint-clip.P.og -o joint-clip.P.sort.og 

#error
odgi layout -i joint-clip.P.sort.og -o graphs.combined.og.lay -T graphs.combined.og.lay.tsv -t 30 -P  
odgi draw -i joint-clip.P.sort.og -c graphs.combined.og.lay -p graphs.combined.og.lay.draw.png -H 1000

Does odgi support drawing the cliped or pruned graph ? log file odgi: /home/hickey/dev/cactus/build/bin-tmp/cactus/build-pangenome-tools/odgi-v0.8.3/build/sdsl-lite-prefix/src/sdsl-lite-build/include/sdsl/int_vector.hpp:1360: sdsl::int_vector< >::reference sdsl::int_vector< >::operator[](const size_type&) [with unsigned char t_width = 1; sdsl::int_vector< >::reference = sdsl::int_vector_reference<sdsl::int_vector<1> >; sdsl::int_vector< >::size_type = long unsigned int]: Assertion `idx < this->size()' failed. /var/spool/pbs/mom_priv/jobs/903065.mn01.SC: line 22: 25972 Aborted (core dumped) odgi layout -i joint-clip.P.sort.og -o graphs.combined.og.lay -T graphs.combined.og.lay.tsv -t 30 -P /var/spool/pbs/mom_priv/jobs/903065.mn01.SC: line 23: 26685 Segmentation fault (core dumped) odgi draw -i joint-clip.P.sort.og -c graphs.combined.og.lay -p graphs.combined.og.lay.draw.png -H 1000

zhangyixing3 commented 6 months ago

I'll try the odgi extract command

zhangyixing3 commented 6 months ago

The results of odgi extract are also abnormal,for example,

complex region
#path   start   end
001_111#0#Chr4 19375679        19375685
001_111#0#Chr4 19377065        19377092
001_111#0#Chr4 19395332        19395353
001_111#0#Chr4 19396674        19396680
001_111#0#Chr5 19250513        19250523
P       001_111#0#Chr4:19375685-22740591
P       001_111#0#Chr4:0-19377065
P       001_111#0#Chr4:19377092-22740591
P       001_111#0#Chr4:0-19395332
P       001_111#0#Chr4:19395353-22740591
P       001_111#0#Chr4:0-19396674
P       001_111#0#Chr4:19396680-22740591
P       001_111#0#Chr5:0-19250513
P       001_111#0#Chr5:19250523-30923699

The results show that chr 5 of 001_111 is correct, but the results for chr 4 are incorrect. Here is the command I used

odgi extract -i joint-clip.P.og --threads 10 -P \
     -c 0 --inverse  -d 0 \
     -b pan.high_depth.10_0.bed \
     -R retain \
     -o 97samples.pan.clean.og

$ odgi  version
v0.8.3-26-gbc7742ed
yeeus commented 5 months ago

Hello friend! I also met this problem and opened a new issue with my specific questions (#577), welcome to discuss this issue. By the way, have you solved it?

zhangyixing3 commented 5 months ago

I want to use odgi extract --inverseto input complex regions and extract non-complex subgraphs, but I encountered an error. Subsequently, I directly extracted non-complex subgraphs from a BED file that inputs non-complex regions (odgi extract). It is successful

yeeus commented 5 months ago

That's awesome! I followed your advice and extracted non_complex regions and it seems good!