Closed userzxyz closed 2 years ago
The consensus is automatically generated in the POA step. It is the "heaviest bundle" of the MSA.
We have previously produced a consensus graph that consists of these segments only and links between them greater than a given length (-C). But this is disabled by default at the moment because the algorithm needs work to produce correct output.
On Fri, Jun 4, 2021, 21:59 userzxyz @.***> wrote:
Hello,
I am trying to understand MAF format from smooth.maf file. I have 18 genomes and created graphs for each of the chromosomes. Here are some example lines from smooth.maf output file for one of the chromosomes:
a blocks=12-13 loops=false merged=true below_thresh=true s Consensus_12-13 0 7250 + 7250 CCCTCCTACTCATCGGGGCCTGGCACTTGCCCCGACGGCCGGGTGTAGGTCGCGCGCTTAAGCGCCATCCATTTTCGGGGCTAGTTGATTCGGCAGGTGAGTTGTTACACATTCCTTAGCGGA s Sample2 0 7250 + 5243748 CCCTCCTACTCATCGGGGCCTGGCACTTGCCCCGACGGCCGGGTGTAGGTCGCGCGCTTAAGCGCCATCCATTTTCGGGGCTAGTTGATTCGGCAGGTGAGTTGTTACACATTCCTTAGCGGA
From the documentation for MAF files, I understand that this paragraph represents a set of multiple alignment. But I could not find a documentation for Consensus. Does it means for block 12_13, only Sample_2 aligns same? Thank you for any help!
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pangenome/pggb/issues/106, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQEOSA5ZQJ4CHTJR2GMDTREWATANCNFSM46DNMCOQ .
Thank you! I was trying to use MAF output to confirm that the graph generated is depicting real insertions/deletions, I wanted to extract sequences for a sample from a particular genomic region where we know there is an insertion/deletion for that sample. For example, there is a big deletion at chromosome 2: 100456-101675 for sample1 which is one of the samples in graph genome. How can I relate this information to the graph to confirm there actually is a deletion in that region? Thank you!
I suggest using VG deconstruct to get this kind of information from the graph.
On Tue, Jun 8, 2021, 02:22 userzxyz @.***> wrote:
Thank you! I was trying to use MAF output to confirm that the graph generated is depicting real insertions/deletions, I wanted to extract sequences for a sample from a particular genomic region where we know there is an insertion/deletion for that sample. For example, there is a big deletion at chromosome 2: 100456-101675 for sample1 which is one of the samples in graph genome. How can I relate this information to the graph to confirm there actually is a deletion in that region? Thank you!
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pangenome/pggb/issues/106#issuecomment-856347817, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQELNEGFNWJWEWANGVL3TRVPEVANCNFSM46DNMCOQ .
Thank you @ekg! I noticed that pggb graph arranges the sample in alphabetical order. I tried to rearrange the sample order in the step while making chromosome wise files. But the pggb graph again was in alphabetical order. Is there any option to rearrange the sample order?
In the graph, it should be in the input order in the FASTA. Are you sure that order isn't being respected?
In the MAF the order may be alphabetical though.
On Thu, Jun 10, 2021, 18:20 userzxyz @.***> wrote:
Thank you @ekg https://github.com/ekg! I noticed that pggb graph arranges the sample in alphabetical order. I tried to rearrange the sample order in the step while making chromosome wise files. But the pggb graph again was in alphabetical order. Is there any option to rearrange the sample order?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pangenome/pggb/issues/106#issuecomment-858761612, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQEOJDXEKCSEJEVCJ6GTTSDQ4DANCNFSM46DNMCOQ .
I am currently running another job for the same and will update how it goes.
I want to ask that I used deconstruct
as per your suggestion:
vg deconstruct sample.xg -g sample.gbwt > sample_deconstruct.vcf
I made sample.gbwt
as:
vg gbwt -G graph.fixed.gfa -p -o sample.gbwt
and sample.xg
as:
vg convert -x -g graph.fixed.gfa > sample.xg
I am trying to understand the output vcf format. How can I get rid of Consensus as the sample names in the vcf header. I only want to keep the sample names that I used in graph construction.
If you want to remove the Consensus sample names, you have to remove these from the final smoothed GFA. We do this now by default, when we call vg deconstruct
. Please see https://github.com/pangenome/pggb/blob/c1886f8ce3c6bb229530130694ee14b323d57c53/pggb#L484.
@userzxyz Were you able to solve your problem?
Hello,
I am trying to understand MAF format from
smooth.maf
file. I have 18 genomes and created graphs for each of the chromosomes. Here are some example lines from smooth.maf output file for one of the chromosomes:From the documentation for MAF files, I understand that this paragraph represents a set of multiple alignment. But I could not find a documentation for Consensus. Does it means for block 12_13, only Sample_2 aligns same? Thank you for any help!