nmdp-bioinformatics / gfe-db

Graph database representing IPD-IMGT/HLA sequence data as GFE
https://gfe-db.readthedocs.io
GNU General Public License v3.0
9 stars 15 forks source link

Add the feature sequence as an attribute of Feature node #58

Open pbashyal-nmdp opened 2 years ago

pbashyal-nmdp commented 2 years ago

From smack:

The sequence note has the sequence attribute, which is the full gene sequence. However, there is no way to identify the sequence for an individual feature. Given that there is no defined length (so far) for either of the UTRs, that makes it nigh impossible to extract the sequence for say exons 2 and 3 from the graph. Given that there is no defined length (so far) for either of the UTRs, that makes it nigh impossible to extract the sequence for say exons 2 and 3 from the graph. Since a given feature node can be associated with multiple alleles/gfes/sequences, it seems like each unique feature should also include its sequence as an attribute. Can a sequence attribute be added to each feature node, identifying the specific sequence for that feature?

From @mmaiers-nmdp :

I agree that each feature should it’s sequence as an attribute. I think it used to. That’s the only way I could have done queries of identity of feature sequences so I’ll say “it definitely used to be there” so let’s open an issue and put it back. Having the feature Number would be good but it’s import to keep in mind that feature service is accessioning a quad: Hugo-gene-name, term, rank, sequence and assigning a unique id relative to the first three. All the more reason that the sequence itself needs to be there to establish identity among features.

Related #40