openvax / varcode

Library for manipulating genomic variants and predicting their effects
Apache License 2.0
81 stars 25 forks source link

Switch to interbase coordinates #40

Open iskandr opened 9 years ago

iskandr commented 9 years ago

Currently have a lot of special case logic for insertions (e.g. if len(ref) == 0), can use unified logic if the start/end coordinates are between bases.

See: http://alternateallele.blogspot.de/2012/03/genome-coordinate-conventions.html

timodonnell commented 9 years ago

This is a pile of annoying work, but I think we should either prioritize it or drop it. Having the read_evidence module using interbase coordinates while the rest of varcode uses inclusive is really untenable -- I'm hitting endless off by one errors and don't feel like I can write a blog post on read evidence pie charts when the code to generate them is constantly switching between two coordinate systems mandated by the same software package, all of which would change when we transition.

If we do change to interbase, have to figure out: what coordinate system should varcode use when reporting results to users (e.g. str(variant)). It will cause some confusion to give them 0-coordinates there I think (I accidentally sent someone 0-based coordinates to do PCR validation and confusion ensued) but on the other hand having it give different results than the variant properties is also confusing.

If there's a reasonable way to stick with 1-based coordinates, I wouldn't be opposed to that. Then I would just change read_evidence to use 1-based. And we should switch Variant instances to somehow use a Locus class (either have a locus or inherit from it) so we can work with loci and variants in a unified way.

hammer commented 9 years ago

@iskandr @timodonnell have y'all had a chance to discuss this issue in person? Given the SciPy presentation last week and upcoming blog post this week, we should at least have a plan to communicate to potential users.