nmdp-bioinformatics / dash

Data Standards Hackathon for NGS based typing.
GNU Lesser General Public License v3.0
13 stars 13 forks source link

HML Spec: Is it possible to represent a homopolymer of uncertain length in HML? #30

Open bnbowman opened 10 years ago

bnbowman commented 10 years ago

I.e. A 3k consensus sequence that could contain "AGGGGGGGGA" or "AGGGGGGGA" but would otherwise be identical.

Current spec does not seem sufficient since it appears to allow only 1-2 sequences, where-as the simplest solution, supplying all possible sequences is not possible if (A) the gene is also diploid or (B) if there is more than 2 possible such sequences (i.e. 2 ambiguous positions)

mmaiers-nmdp commented 9 years ago

There can be an arbitrary number of "consensus-sequence" elements (1..*) in an HML message. As part of the process of incorporating the MIRING attributes into HML I think we could also include alternates.

mmaiers-nmdp commented 9 years ago

What we (really) need is the ability to represent an assembly graph. FASTG is one proposed solution to this. Global Alliance is working on another. A goal for HML would be to offer the ability to represent anything these formats do/can.

ghost commented 9 years ago

Work-in-progress GA4GH variation reference schema variationReference.avdl

See also http://arxiv.org/abs/1404.5010