pangenome / odgi

Optimized Dynamic Genome/Graph Implementation: understanding pangenome graphs
https://doi.org/10.1093/bioinformatics/btac308
MIT License
196 stars 40 forks source link

odgi untangle multiple reference file - potential bug #581

Open Catriona-Miller opened 4 months ago

Catriona-Miller commented 4 months ago

Hi,

I have been trying to use chm13 instead of grch38 as a reference file for odgi untangle with the same code as your readthedocs tutorial. Since there are two chm13 paths for the areas I'm interested in, I've been using the -R flag with a file that lists the paths (e.g. attached) such as below:

(echo query.name query.start query.end ref.name ref.start ref.end score inv self.cov n.th | tr ' ' '\t'; odgi untangle -i VKORC1_gene_sorted.og -R target_chm13.txt --threads 8 -m 256 -P | bedtools sort -i - ) | awk '$8 == "-" { x=$6; $6=$5; $5=x; } { print }' | tr ' ' '\t' > chr16_chm13_VKORC1_untangle1.bed

However, reading the outputted bed file into R, it only ever uses one of the two chm13 paths as a reference. I've tried adding an extra blank line at the start of target_chm13.txt and tried swapping the order of the two paths but it always uses the second path as a reference. Unsure if this is a bug or a misunderstanding of the process on my end. Thanks target_chm13.txt

ekg commented 4 months ago

Can you try to run a multiple entangling and see if the second best hit is the part that never gets touched? One thing that we've seen is that when the identity between two sequences in the reference is 100% we will not emit one of them in the entangling because they have exactly the same match quality to all sequences.

On Sun, Jul 7, 2024, 23:59 Catriona-Miller @.***> wrote:

Hi,

I have been trying to use chm13 instead of grch38 as a reference file for odgi untangle with the same code as your readthedocs tutorial. Since there are two chm13 paths for the areas I'm interested in, I've been using the -R flag with a file that lists the paths (e.g. attached) such as below:

(echo query.name query.start query.end ref.name ref.start ref.end score inv self.cov n.th | tr ' ' '\t'; odgi untangle -i VKORC1_gene_sorted.og -R target_chm13.txt --threads 8 -m 256 -P | bedtools sort -i - ) | awk '$8 == "-" { x=$6; $6=$5; $5=x; } { print }' | tr ' ' '\t' > chr16_chm13_VKORC1_untangle1.bed

However, reading the outputted bed file into R, it only ever uses one of the two chm13 paths as a reference. I've tried adding an extra blank line at the start of target_chm13.txt and tried swapping the order of the two paths but it always uses the second path as a reference. Unsure if this is a bug or a misunderstanding of the process on my end. Thanks target_chm13.txt https://github.com/user-attachments/files/16123261/target_chm13.txt

— Reply to this email directly, view it on GitHub https://github.com/pangenome/odgi/issues/581, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQEMLSNW3IQKLABJAEADZLITDRAVCNFSM6AAAAABKQDRXVCVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM4TINZQHA4DGNI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

AndreaGuarracino commented 4 months ago

Try setting -n > 1 in odgi untangle

    -n[N], --n-best=[N]               Report up to the Nth best target
                                      (reference) mapping for each query
                                      segment (default: 1).

From: Erik Garrison @.> Sent: Tuesday, July 9, 2024 15:55 To: pangenome/odgi @.> Cc: Subscribed @.***> Subject: Re: [pangenome/odgi] odgi untangle multiple reference file - potential bug (Issue #581)

Can you try to run a multiple entangling and see if the second best hit is the part that never gets touched? One thing that we've seen is that when the identity between two sequences in the reference is 100% we will not emit one of them in the entangling because they have exactly the same match quality to all sequences.

On Sun, Jul 7, 2024, 23:59 Catriona-Miller @.***> wrote:

Hi,

I have been trying to use chm13 instead of grch38 as a reference file for odgi untangle with the same code as your readthedocs tutorial. Since there are two chm13 paths for the areas I'm interested in, I've been using the -R flag with a file that lists the paths (e.g. attached) such as below:

(echo query.name query.start query.end ref.name ref.start ref.end score inv self.cov n.th | tr ' ' '\t'; odgi untangle -i VKORC1_gene_sorted.og -R target_chm13.txt --threads 8 -m 256 -P | bedtools sort -i - ) | awk '$8 == "-" { x=$6; $6=$5; $5=x; } { print }' | tr ' ' '\t' > chr16_chm13_VKORC1_untangle1.bed

However, reading the outputted bed file into R, it only ever uses one of the two chm13 paths as a reference. I've tried adding an extra blank line at the start of target_chm13.txt and tried swapping the order of the two paths but it always uses the second path as a reference. Unsure if this is a bug or a misunderstanding of the process on my end. Thanks target_chm13.txt https://github.com/user-attachments/files/16123261/target_chm13.txt

— Reply to this email directly, view it on GitHub https://github.com/pangenome/odgi/issues/581, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQEMLSNW3IQKLABJAEADZLITDRAVCNFSM6AAAAABKQDRXVCVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM4TINZQHA4DGNI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

— Reply to this email directly, view it on GitHubhttps://github.com/pangenome/odgi/issues/581#issuecomment-2217813889, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AO26XHXQSI3XHOXXQ2SDOVDZLPTVFAVCNFSM6AAAAABKQDRXVCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJXHAYTGOBYHE. You are receiving this because you are subscribed to this thread.Message ID: @.***>

Catriona-Miller commented 4 months ago

Thanks both. I set -n 2 but still I only ever get the second path as the ref path in output. E.g. see the two files that are output for the below code. The only difference is the order I've listed the two paths in target_chm13.txt

(echo query.name query.start query.end ref.name ref.start ref.end score inv self.cov n.th | tr ' ' '\t'; odgi untangle -i VKORC1_gene_sorted.og -R target_chm13.txt --threads 8 -m 256 -P -n 2 | bedtools sort -i - ) | awk '$8 == "-" { x=$6; $6=$5; $5=x; } { print }' | tr ' ' '\t' > chr16_chm13_VKORC1_untangle1.bed

chr16_chm13_VKORC1_untangle_trial.txt chr16_chm13_VKORC1_untangle_trial2.txt