Open Catriona-Miller opened 4 months ago
Can you try to run a multiple entangling and see if the second best hit is the part that never gets touched? One thing that we've seen is that when the identity between two sequences in the reference is 100% we will not emit one of them in the entangling because they have exactly the same match quality to all sequences.
On Sun, Jul 7, 2024, 23:59 Catriona-Miller @.***> wrote:
Hi,
I have been trying to use chm13 instead of grch38 as a reference file for odgi untangle with the same code as your readthedocs tutorial. Since there are two chm13 paths for the areas I'm interested in, I've been using the -R flag with a file that lists the paths (e.g. attached) such as below:
(echo query.name query.start query.end ref.name ref.start ref.end score inv self.cov n.th | tr ' ' '\t'; odgi untangle -i VKORC1_gene_sorted.og -R target_chm13.txt --threads 8 -m 256 -P | bedtools sort -i - ) | awk '$8 == "-" { x=$6; $6=$5; $5=x; } { print }' | tr ' ' '\t' > chr16_chm13_VKORC1_untangle1.bed
However, reading the outputted bed file into R, it only ever uses one of the two chm13 paths as a reference. I've tried adding an extra blank line at the start of target_chm13.txt and tried swapping the order of the two paths but it always uses the second path as a reference. Unsure if this is a bug or a misunderstanding of the process on my end. Thanks target_chm13.txt https://github.com/user-attachments/files/16123261/target_chm13.txt
— Reply to this email directly, view it on GitHub https://github.com/pangenome/odgi/issues/581, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQEMLSNW3IQKLABJAEADZLITDRAVCNFSM6AAAAABKQDRXVCVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM4TINZQHA4DGNI . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Try setting -n > 1 in odgi untangle
-n[N], --n-best=[N] Report up to the Nth best target
(reference) mapping for each query
segment (default: 1).
From: Erik Garrison @.> Sent: Tuesday, July 9, 2024 15:55 To: pangenome/odgi @.> Cc: Subscribed @.***> Subject: Re: [pangenome/odgi] odgi untangle multiple reference file - potential bug (Issue #581)
Can you try to run a multiple entangling and see if the second best hit is the part that never gets touched? One thing that we've seen is that when the identity between two sequences in the reference is 100% we will not emit one of them in the entangling because they have exactly the same match quality to all sequences.
On Sun, Jul 7, 2024, 23:59 Catriona-Miller @.***> wrote:
Hi,
I have been trying to use chm13 instead of grch38 as a reference file for odgi untangle with the same code as your readthedocs tutorial. Since there are two chm13 paths for the areas I'm interested in, I've been using the -R flag with a file that lists the paths (e.g. attached) such as below:
(echo query.name query.start query.end ref.name ref.start ref.end score inv self.cov n.th | tr ' ' '\t'; odgi untangle -i VKORC1_gene_sorted.og -R target_chm13.txt --threads 8 -m 256 -P | bedtools sort -i - ) | awk '$8 == "-" { x=$6; $6=$5; $5=x; } { print }' | tr ' ' '\t' > chr16_chm13_VKORC1_untangle1.bed
However, reading the outputted bed file into R, it only ever uses one of the two chm13 paths as a reference. I've tried adding an extra blank line at the start of target_chm13.txt and tried swapping the order of the two paths but it always uses the second path as a reference. Unsure if this is a bug or a misunderstanding of the process on my end. Thanks target_chm13.txt https://github.com/user-attachments/files/16123261/target_chm13.txt
— Reply to this email directly, view it on GitHub https://github.com/pangenome/odgi/issues/581, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQEMLSNW3IQKLABJAEADZLITDRAVCNFSM6AAAAABKQDRXVCVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM4TINZQHA4DGNI . You are receiving this because you are subscribed to this thread.Message ID: @.***>
— Reply to this email directly, view it on GitHubhttps://github.com/pangenome/odgi/issues/581#issuecomment-2217813889, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AO26XHXQSI3XHOXXQ2SDOVDZLPTVFAVCNFSM6AAAAABKQDRXVCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJXHAYTGOBYHE. You are receiving this because you are subscribed to this thread.Message ID: @.***>
Thanks both. I set -n 2 but still I only ever get the second path as the ref path in output. E.g. see the two files that are output for the below code. The only difference is the order I've listed the two paths in target_chm13.txt
(echo query.name query.start query.end ref.name ref.start ref.end score inv self.cov n.th | tr ' ' '\t'; odgi untangle -i VKORC1_gene_sorted.og -R target_chm13.txt --threads 8 -m 256 -P -n 2 | bedtools sort -i - ) | awk '$8 == "-" { x=$6; $6=$5; $5=x; } { print }' | tr ' ' '\t' > chr16_chm13_VKORC1_untangle1.bed
chr16_chm13_VKORC1_untangle_trial.txt chr16_chm13_VKORC1_untangle_trial2.txt
Hi,
I have been trying to use chm13 instead of grch38 as a reference file for odgi untangle with the same code as your readthedocs tutorial. Since there are two chm13 paths for the areas I'm interested in, I've been using the -R flag with a file that lists the paths (e.g. attached) such as below:
(echo query.name query.start query.end ref.name ref.start ref.end score inv self.cov n.th | tr ' ' '\t'; odgi untangle -i VKORC1_gene_sorted.og -R target_chm13.txt --threads 8 -m 256 -P | bedtools sort -i - ) | awk '$8 == "-" { x=$6; $6=$5; $5=x; } { print }' | tr ' ' '\t' > chr16_chm13_VKORC1_untangle1.bed
However, reading the outputted bed file into R, it only ever uses one of the two chm13 paths as a reference. I've tried adding an extra blank line at the start of target_chm13.txt and tried swapping the order of the two paths but it always uses the second path as a reference. Unsure if this is a bug or a misunderstanding of the process on my end. Thanks target_chm13.txt