jmonlong / manu-vgsv

https://jmonlong.github.io/manu-vgsv/
Other
9 stars 2 forks source link

Complex inversions in the cactus-based graph? #36

Closed jmonlong closed 5 years ago

jmonlong commented 5 years ago

While it's good already to show that vg can genotype simple inversions (simulated or the few from SVPOP), I'm wondering if we could further push the assembly approach by showing that some complex inversions are part of the variants genotyped.

Basically, can we say that some of the variants genotyped in the yeast experiment are actually complex inversions, e.g. inv+del or inv+ins.

Maybe a quick pass comparing the sequence in the input VCF, looking for long stretch matching the rev-comp of the reference allele. Or somehow through the Cactus graph (I don't know enough but maybe inversions are encoded in the cactus graph and we could use that to annotate variants).

glennhickey commented 5 years ago

You can get a BED file of inversions from the Cactus output but there are some limitations. Going through the VCF may end up being more general.

On Fri, Mar 8, 2019 at 2:33 AM Jean Monlong notifications@github.com wrote:

While it's good already to show that vg can genotype simple inversions (simulated or the few from SVPOP), I'm wondering if we could further push the assembly approach by showing that some complex inversions are part of the variants genotyped.

Basically, can we say that some of the variants genotyped in the yeast experiment are actually complex inversions, e.g. inv+del or inv+ins.

Maybe a quick pass comparing the sequence in the input VCF, looking for long stretch matching the rev-comp of the reference allele. Or somehow through the Cactus graph (I don't know enough but maybe inversions are encoded in the cactus graph and we could use that to annotate variants).

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/jmonlong/manu-vgsv/issues/36, or mute the thread https://github.com/notifications/unsubscribe-auth/AA2_7isq7RAB147u8fMYOKMf9e7M9LIgks5vUhJDgaJpZM4bk0Dp .

eldariont commented 5 years ago

Okay, I wrote a script for scanning a VCF file for long reverse complement matches between REF and ALT allele. I got a few matches in the VCFs from the cactus graph but they are all of low complexity (e.g. ATATATATATAT). However, I need to redo this analysis on calling results using the newest vg version because Glenn improved inversion calling recently. I will let you know when I have results on this.

eldariont commented 5 years ago

I redid the analysis on calls from the most recent version of toil-vg call but the results were the same. Only very few inversions larger than 10bp were called and all of them had low complexity. Maybe there just weren't many inversion events in the speciation of the different yeast strains?

jmonlong commented 5 years ago

Ok, thanks for checking again. Do you remember if there were any inversions in the output of AsmVar? (I think Assemblytics doesn't report them). If it's easy to do, the last check would be check the output of cactus for this BED file of inversions.

eldariont commented 5 years ago

Yes, AsmVar finds 6 inversions across all 11 yeast strains when compared to the reference strain S288C. A) 5 of those are actually the same inversion detected in each of the 5 S. paradoxus strains. When looking at the cactus graph in the region, I saw that cactus does not represent the inversion as one would expect with forward and backward traversal of nodes. Rather, the inversion is represented by numerous small substitution nodes. B) The sixth inversion is more interesting. When making a dotplot of the two assemblies (S288C and SK1) in this region I get this (S288 on x and SK1 on y): inversion vg call quite reasonably detected a variant with a 282bp REF allele and a 955bp ALT allele. The ALT allele contains the inverted REF allele in the middle. This captures the event surprisingly well in my opinion.

eldariont commented 5 years ago

The construct graphs do not contain inversions yet. I could re-create the graphs with inversions but I don't know whether it makes much sense with only two different inversions across all strains. Inversion A is a very clear one that I think would be easily picked up by vg call. Inversion B is more complex but would probably be detected as well as it is from the cactus graph.

jmonlong commented 5 years ago

Thanks for checking that. I don't think it's worth redoing the analysis again for now, especially for just 2 inversions. It was just in case we could see something from the results we had already.