legumeinfo / graphql-server

A GraphQL server that supports the Legume Information System and related biodata websites
Apache License 2.0
2 stars 3 forks source link

Extend searchTraits to include studyType (GWAS or QTLStudy), publication, and author #86

Closed sammyjava closed 11 months ago

sammyjava commented 11 months ago

This is in order to provide search capability for the new trait search web component, in https://github.com/legumeinfo/web-components/pull/212.

@That-Thing note: I'm adding some GWAS and QTLStudy collections to MiniMine 5.1.0.3, as well as the corresponding markers and maps, so that we can test your web component a bit better against multiple genus and species and the same trait ("Pod length") from various studies. Here's what it will have for genetic data:

    <source name="Cerinza_x_G24404.qtl.Blair_Iriarte_2006" type="lis-qtl" dump="true">
    <source name="DOR364_x_BAT477.qtl.Blair_Galeano_2012" type="lis-qtl" dump="true">
    <source name="ZN016_x_Zhijiang282.qtl.Xu_Wu_2017" type="lis-qtl" dump="true">

    <source name="mixed.gwas.Jain_Poromarto_2019" type="lis-gwas" dump="true">
    <source name="mixed.gwas.Myers_Wallace_2019" type="lis-gwas" dump="true">
    <source name="mixed.gwas.Raggi_Caproni_2019" type="lis-gwas" dump="true">
    <source name="ZN016_x_Zhijiang282.gwas.Xu_Wu_2017" type="lis-gwas" dump="true">

    <source name="G19833.gnm1.mrk.BARCBean6K" type="lis-markers" dump="true">
    <source name="G19833.gnm1.mrk.Gaitan-Solis_Duque_2002" type="lis-markers" dump="true">
    <source name="IT97K-499-35.gnm1.mrk.Cowpea1MSelectedSNPs" type="lis-markers" dump="true">

    <source name="Cerinza_x_G24404.map.Blair_Iriarte_2006" type="lis-map" dump="true">
    <source name="DOR364_x_BAT477.map.Blair_Galeano_2012" type="lis-map" dump="true">
    <source name="ZN016_x_Zhijiang282.map.Xu_Wu_2017" type="lis-map" dump="true">

You've probably been coding against the dev GlycineMine 5.1.0.3 which has tons of this stuff, but I like to code against MiniMine, so I add stuff to it when appropriate.

That-Thing commented 11 months ago

I noticed that specifying a genus in the query parameters doesn't remove results from other genuses (genera?).

For example, specifying "Glycine" (or any random string) returns a result for "Days to Flowering" for Phaseolus vulgaris.

Query:

query Traits($pageSize: Int, $page: Int, $name: String, $genus: String, $species: String, $studyType: String, $publicationId: String, $author: String) {
  traits(pageSize: $pageSize, page: $page, name: $name, genus: $genus, species: $species, studyType: $studyType, publicationId: $publicationId, author: $author) {
    results {
      name
      gwas {
        organism {
          genus
          name
          species
        }
      }
      qtlStudy {
        organism {
          genus
          name
          species
        }
      }
    }
  }
}
{
  "genus": "Any string",
  "species": "123",
  "publicationId": "",
  "studyType": "",
  "author": "",
  "name": "",
  "page": 1,
  "pageSize": 10
}
sammyjava commented 11 months ago

Correct. As I said, the genus/species filters do not currently work, unless you specify the study type as well. We'll get that particular situation fixed in graphql-server in the future - just assume for now that it works correctly as you update your web component.

sammyjava commented 11 months ago

So, @alancleary I thought a bit about implementing a UNION of sorts on the traits resolver. I have no idea how to do it in the resolver, being pretty lost when it comes to this stuff. And I don't see how to do it in the API method, either.

Bottom line is this stuff is abstracted far enough from what I'm used to that I can't move this issue forward viz a viz querying traits per genus/species without specifying the study type. (Other than updating the mine model with a Trait.organism reference, in which case this is all easy, but it would be some months before that model reaches production.)

I don't think there's anything wrong with this being on a back burner for a while since the UI spec asks for something that isn't straightforward. And I'd be happy to punch that data model update into 5.1.0.4 and just move on. It does seem likely that folks would like to know "what are the measured traits for a given species?" without any further specification, even in the mines.

alancleary commented 11 months ago

@sammyjava I think it may actually be possible to build the query using an outer join. We would just use the constraint logic to group the constraints by GWAS or study:

<query name="" model="genomic" view="Trait.primaryIdentifier" longDescription="" sortOrder="Trait.primaryIdentifier asc" constraintLogic="((A and B) or (C and D)) and E">
    <join path="Trait.qtlStudy" style="OUTER"/>
    <join path="Trait.gwas" style="OUTER"/>
    <constraint path="Trait.qtlStudy.organism.genus" code="A" op="=" value="Phaseolus"/>
    <constraint path="Trait.qtls.qtlStudy.organism.species" code="B" op="=" value="vulgaris"/>
    <constraint path="Trait.gwas.organism.genus" code="C" op="=" value="Phaseolus"/>
    <constraint path="Trait.gwas.organism.species" code="D" op="=" value="vulgaris"/>
    <constraint path="Trait.publications.doi" code="E" op="=" value="10.1093/nar/gkv1159"/>
</query>
sammyjava commented 11 months ago

Well, now you've become an advanced pathquerista! (Caminoquerista?) But that particular XML fails loading with:

Import of query(ies) failed. Invalid format of query: "Logic expression is not compatible with outer join status: Cannot split node (A and B) or (C and D)"

But I'll ponder this further, I didn't give it more time since I was latched onto wanting a UNION, could be we can pull it off.

But I'm also more inclined to add Trait.organism to the next data model as I'm beginning to think that's an appropriate thing since Trait is now tied to a single organism (unlike earlier models). We do repeat things in the data model for ease of querying and display. (You can't have a table do a fancy join on a report page, it's got to be all attributes, either direct or via reference.)

alancleary commented 11 months ago

Yeah, I vaguely recall reading some Intermine documentation last week that you can't use OR in your constraint logic when using an outer join...

Interestingly, if you omit the constraint logic altogether then you get results. Perhaps the constraints on outer joined (sub)fields aren't considered when the joined field is null?

sammyjava commented 11 months ago

Yeah, I'm not sure why that is the case. But another angle on this is to craft the query to get null for GWAS and QTLStudy attributes in cases where the genus doesn't meet the constraint. Here I add Trait.gwas.primaryIdentifier and Trait.qtlStudy.primaryIdentifier to the regular old outer join query:

<query name="" model="genomic" view="Trait.primaryIdentifier Trait.gwas.primaryIdentifier Trait.qtlStudy.primaryIdentifier" longDescription="" sortOrder="Trait.primaryIdentifier asc" constraintLogic="A and B">
  <join path="Trait.gwas" style="OUTER"/>
  <join path="Trait.qtlStudy" style="OUTER"/>
  <constraint path="Trait.gwas.organism.genus" code="A" op="=" value="Vigna"/>
  <constraint path="Trait.qtlStudy.organism.genus" code="B" op="=" value="Vigna"/>
</query>

In this case, the JSON results contain null for BOTH the GWAS and QTLStudy attributes when genus != "Vigna":

{"modelName":"genomic","columnHeaders":["Trait > Identifier","Trait > GWAS","Trait > QTL Study"],"rootClass":"Trait","start":0,"views":["Trait.primaryIdentifier","Trait.gwas.primaryIdentifier","Trait.qtlStudy.primaryIdentifier"],
"results":[
["Cerinza_x_G24404.qtl.Blair_Iriarte_2006:Days_to_flowering",null,null],
["Cerinza_x_G24404.qtl.Blair_Iriarte_2006:Days_to_maturity",null,null],
["Cerinza_x_G24404.qtl.Blair_Iriarte_2006:Plant_height",null,null],
["Cerinza_x_G24404.qtl.Blair_Iriarte_2006:Plant_width",null,null],
["Cerinza_x_G24404.qtl.Blair_Iriarte_2006:Pods_per_plant",null,null],
["Cerinza_x_G24404.qtl.Blair_Iriarte_2006:Seed_weight",null,null],
["Cerinza_x_G24404.qtl.Blair_Iriarte_2006:Seeds_per_plant",null,null],
["Cerinza_x_G24404.qtl.Blair_Iriarte_2006:Yield",null,null],
["DOR364_x_BAT477.qtl.Blair_Galeano_2012:Days_to_flowering",null,null],
["DOR364_x_BAT477.qtl.Blair_Galeano_2012:Days_to_maturity",null,null],
["DOR364_x_BAT477.qtl.Blair_Galeano_2012:Seed_weight",null,null],
["DOR364_x_BAT477.qtl.Blair_Galeano_2012:Yield",null,null],
["DOR364_x_BAT477.qtl.Blair_Galeano_2012:Yield_per_day",null,null],
["ZN016_x_Zhijiang282.gwas.Xu_Wu_2017:Pod_length","ZN016_x_Zhijiang282.gwas.Xu_Wu_2017",null],
["ZN016_x_Zhijiang282.qtl.Xu_Wu_2017:Pod_length",null,"ZN016_x_Zhijiang282.qtl.Xu_Wu_2017"],
["mixed.gwas.Jain_Poromarto_2019:Soybean_cyst_nematode",null,null],
["mixed.gwas.Myers_Wallace_2019:Flower_color",null,null],
["mixed.gwas.Myers_Wallace_2019:Immature_pod_color__blue_to_yellow",null,null],
["mixed.gwas.Myers_Wallace_2019:Immature_pod_color__green_to_red",null,null],
["mixed.gwas.Myers_Wallace_2019:Immature_pod_color__lightness",null,null],
["mixed.gwas.Myers_Wallace_2019:Pod_phenol",null,null],
["mixed.gwas.Raggi_Caproni_2019:Days_to_flowering",null,null]
],"executionTime":"2023.10.25 10:42::50","wasSuccessful":true,"error":null,"statusCode":200}

We could, in principle, execute a query like this but only pass on the results that contain a non-null Trait.gwas.primaryIdentifier or Trait.qtlStudy.primaryIdentifier. (Or Trait.gwas.id and Trait.qtlStudy.id, whatever.)

alancleary commented 11 months ago

That appears to have the same effect as just leaving out the constraint logic. Either way, this seems like a potential solution, although we're paginating our results so there's a whole other headache to deal with.

sammyjava commented 11 months ago

Yeah, I was using the web app in which case you cannot leave out the constraint logic, which I think defaults to AND anyway. So we can't do this filtering in the API method before the results hit pagination? Although this business with promises completely mystifies me. In any case, just an idea. I'm not sure it's worth spending time on this when I can just add Trait.organism and claim victory.

alancleary commented 11 months ago

So we can't do this filtering in the API method before the results hit pagination?

We could, but then we're risking getting a boat load or results sent to the server. Not that this would happen with Traits but if we use that solution here it would set a precedent for using it on other, less finite views in the future.

Anyway, just adding organism to the Trait model seems like the way to go here. Hopefully we won't run into a similar query where adding a new field to the model is less justified...

sammyjava commented 11 months ago

Yup, this one's well justified. I'll do so in 5.1.0.4, which is already underway. I'll move these updates to a new 5.1.0.4-model-changes branch. @That-Thing I think you're done with the web component, but it'll be a while before this mine model update reaches production, typically a half year or so. It'll be in MiniMine 5.1.0.4 soon, though, where I'll test the graphql update with Trait.organism in the model. I'll let you know.

alancleary commented 11 months ago

That all sounds good.

And thanks for the reminder about the web component. I'll continue reviewing the PR today.

sammyjava commented 11 months ago

As per today's meeting, I'll re-implement search on studyType, publication, and author in the 5.1.0.3-model-changes branch. The web component can "freeze" genus and species on the user form and not send them to the graphql server (since they won't be supported until 5.1.0.4).

sammyjava commented 11 months ago

This is done in commit fea48fd59e8068bbae36b46345e4f4dcbc93bfa5 to the 5.1.0.3-model-changes branch.