warelab / gramene-mongodb

Gramene's MongoDB. ETL code
MIT License
1 stars 4 forks source link

add overlapping dna_align_feature labels to genes #28

Open ajo2995 opened 8 years ago

ajo2995 commented 8 years ago

When dumping genes, also pull in the names of features from the dna_align_feature table that overlap with the gene. Put these ids into a new field called overlapping_features which is a list of {hit_name: "dna_align_feature.hit_name", analysis: "analysis_description.display_label"}

This is required for the maize v4 search because some v3 gene models may not get into the id history but overlap with new gene models.

Alternatively, we may want to dump dna_align_feature entries into a separate collection which can populate suggestions that lead to region queries.

ajo2995 commented 8 years ago

I don't know if it is possible with the ebisearch api, but if an old model doesn't overlap with a new model, it would be nice if the search response could just provide the link to the region where the old model now maps.

This doesn't seem like it would work. I think the links here are generated in the local ensembl server http://ensembl.gramene.org/Multi/Search/Results?species=all;idx=;q=pad4;site=ensemblunit

ajo2995 commented 8 years ago

normally this kind of thing should be taken care of by the xref pipeline, so this feature can be a one-off for the maize v4 search db