Open tiramisutes opened 2 months ago
Hi, for humans it's typically not possible to get a complete rDNA assembly. There is a file morphgraph.gfa
in the output folder which shows how the morphs output by ribotin might be placed next to each others, but I haven't seen any case where it would have been possible to get the complete rDNA arrays from this information. You can try something similar to the CHM13 rDNA assembly where you take the most common morphs, estimate their copy counts and then fill in the array with a model sequence.
For non-humans it might be possible or not depending on how large and divergent the rDNA arrays are, but the same graph morphgraph.gfa
would be a starting point.
I am running ribotin on human and have the following questions.
I got more than one morphgraph.gfa
under subdirectory ribotin0
, ribotin1
and etc. This is one of my results in morphgraph.gfa
and it seems that I got 10 morphs. What's the difference of these morphgraph.gfa
under different subdirectory?
For the example shown above, I am wondering how do you determine if a morph is common or not. In your paper, a morph is defined as the sequence of one complete repeat unit appear in the rDNA arrays once or multiple time. Would you please describe more details of interpreting the outputs?
Thanks! Looking forward to your reply!
Hi,
ribotin0/nodes.txt
, ribotin1/nodes.txt
etc which list the nodes in the verkko graph corresponding to the tangle. So the morphs in the different morphgraph files are located in different rDNA tangles. Different tangles very likely means different rDNA arrays, but the same tangle can and often has multiple rDNA arrays (eg as a made up example, one tangle might have the rDNA arrays of chr13 and chr14 and another tangle chr15, chr21, and chr22)tangle2_morphconsensus0_coverage104
, tangle2_morphconsensus1_coverage79
and tangle2_morphconsensus2_coverage59
which very likely appear multiple times in this rDNA array. The other morphs have noticably lower coverages which correspond to a low copy count, probably 1-3 copies depending on the morph.Thanks for your explanation!
A few things that I want to further confirm with you:
angle2_morphconsensus0_coverage104
, tangle2_morphconsensus1_coverage79
, tangle2_morphconsensus2_coverage59
). Dose it mean these morphs might be originated from rDNA arrays on different chromosomes.graph.gfa
) with HiFi reads for each nodes that saved in nodes.txt
. The allele graph from each tange is merged for ONT alignment, while I found ont-alns.gaf
for each tangle under ribotin subdirectory. In your paper, it seems that you merged the allele-graph for each tangle and then aligned the ONT reads. readpaths-morphgraph.gaf
, which seems to be created from ont-alns.gaf
. I could found there are 75 ONT reads aligned to path tangle2_morphconsensus0_coverage104
.variants.vcf
, I thought this is the variant among different morphs, is that correct? The variants are called toward a sequence named as heavy_path
. What is this heavy_path
sequence?Hi,
./tmp/
and then the per-tangle alignments are parsed out of the merged alignment fileconsensus.fa
but I would not recommend using the consensus or the variants, instead the morphs should be used.
Dear, Which file output from ribotin can be used for the next step of analysis to obtain complete rDNA assembly, and how to do it? Thanks.