legumeinfo / website-ui-specs

User interface specification of components built for the Jekyll sites.
Apache License 2.0
0 stars 0 forks source link

Draft spec for pangene search #9

Open StevenCannon-USDA opened 1 year ago

StevenCannon-USDA commented 1 year ago

Please see draft spec for pangenes search query - to find ~paralogous/allelic genes (corresponding by homology and synteny): https://github.com/legumeinfo/website-ui-specs/tree/main/pangenes-search

... and provide feedback. Please respond via this issue. @sammyjava @That-Thing @maxglycine @jd-campbell @alancleary @adf-ncgr @sdash-github

The pangene sets we have in the Data Store currently are for: Arachis, Cicer, Glycine, Medicago, Phaseolus, Vigna. I've tried to make the spec suitable for use at LegumeInfo, SoyBase, and PeanutBase.

This spec may again come before the mine backend is ready ... but it sounds like it is on the way.

sammyjava commented 1 year ago

Yeah, the mine 5.1.0.3 graphql-server is ready, and we can test against the dev MiniMine, which is on 5.1.0.3. So nothing holding us back pangene set-wise. The dev MiniMine is at https://mines.dev.lis.ncgr.org/minimine/begin.do

sammyjava commented 1 year ago

FYI, here's what PanGeneSet looks like in the graphql-server branch, just a bucket o' genes and proteins.

<class name="PanGeneSet" extends="Annotatable" is-interface="true">
        <collection name="dataSets" referenced-type="DataSet"/>
        <collection name="genes" referenced-type="Gene" reverse-reference="panGeneSets"/>
        <collection name="proteins" referenced-type="Protein" reverse-reference="panGeneSets"/>
</class>
type PanGeneSet implements Annotatable {
  ## Annotatable
  id: ID!
  identifier: ID!
  ontologyAnnotations: [OntologyAnnotation!]!
  publications: [Publication!]!
  ## PanGeneSet
  dataSets: [DataSet]
  genes: [Gene]
  proteins: [Protein]
}
adf-ncgr commented 1 year ago

thanks @StevenCannon-USDA I have a couple of minor (maybe) comments/questions on the initial spec:

some of these are probably just stuff to think about for future iterations.

maxglycine commented 1 year ago

May want to add an output option to download query results to the users computer. A query could return a large amount of identifiers and the user may want to save them. Otherwise, the user would have to copy html text and paste it somewhere.

sammyjava commented 1 year ago

Genes in this pangene set would be best implemented by adding "size" to the PanGeneSet object in the mines and populating it in a post-processor, as we do with GeneFamily. That is not currently present in PanGeneSet in 5.1.0.3. Nor are there any other aggregate quantities like we have in GeneFamily 5.1.0.3:

<class name="GeneFamily" extends="Annotatable" is-interface="true" term="">
        <attribute name="description" type="java.lang.String"/>
        <attribute name="version" type="java.lang.String"/>
        <attribute name="size" type="java.lang.Integer"/>
        <reference name="phylotree" referenced-type="Phylotree" reverse-reference="geneFamily"/>
        <collection name="genes" referenced-type="Gene"/>
        <collection name="proteins" referenced-type="Protein"/>
        <collection name="proteinDomains" referenced-type="ProteinDomain" reverse-reference="geneFamilies"/>
        <collection name="dataSets" referenced-type="DataSet"/>
        <collection name="tallies" referenced-type="GeneFamilyTally" reverse-reference="geneFamily"/>
</class>

If this is a Big Deal, stop me from building 5.1.0.3 mines. GlycineMine 5.1.0.3 is almost built, took two weeks.

sammyjava commented 1 year ago

May want to add an output option to download query results to the users computer. A query could return a large amount of identifiers and the user may want to save them. Otherwise, the user would have to copy html text and paste it somewhere.

This sounds like an across-the-board option that would be implemented for all results output like pagination. Thoughts, @alancleary ? After all, we all remember that "Every page should have a download button!" :)

StevenCannon-USDA commented 1 year ago

@sammyjava - "Genes in this pangene set" - I would say "not a big deal" (not a high priority in the first implementation).

sammyjava commented 1 year ago

@StevenCannon-USDA I'm a bit confused about the scope of this search. Are you saying that we'll have a list of pangene sets, each with its corresponding genes listed below it? For example, what happens if the only search element is "Glycine", all else left blank? A gigantic list of all Glycine pangene-sets with their genes? (Which is fine, if that's what you want.)

sammyjava commented 1 year ago

And, if so, are you specifying that pagination be on a pangene-set-to-pangene-set basis? Each page displays a single pangene set? (That's just setting the page size to 1, which is easy. The list of genes within a pangene set would be part of that pangene set record's display.) Just want some detail on pagination expectations when we've got results which are a list of lists.