monarch-initiative / monarch-ingest

Data ingest application for Monarch Initiative knowledge graph using Koza
https://monarchinitiative.org
14 stars 1 forks source link

Should we capture gene to phenotype associations with environmental conditions? #419

Open kevinschaper opened 1 year ago

kevinschaper commented 1 year ago

While looking at bringing in multi-term phenotypes for #418, I noticed that SGD included chemical condition data in their files. Which brought up the question for me of whether we should include g2p associations with chemical qualifiers.

Here is an example:

    {
      "conditionRelations": [
        {
          "conditions": [
            {
              "conditionStatement": "chemical:tunicamycin",
              "conditionClassId": "ZECO:0000111",
              "chemicalOntologyId": "CHEBI:29699",
              "conditionQuantity": "0.6 uM"
            }
          ],
          "conditionRelationType": "has_condition"
        }
      ],
      "evidence": {
        "publicationId": "PMID:21179023"
      },
      "phenotypeTermIdentifiers": [
        {
          "termOrder": 1,
          "termId": "APO:0000003"
        },
        {
          "termOrder": 2,
          "termId": "APO:0000087"
        }
      ],
      "dateAssigned": "2006-05-12T00:05:00-00:00",
      "objectId": "SGD:S000281279",
      "phenotypeStatement": "decreased resistance to chemicals"
    },

(This one is interesting, because it seems like the chemical is confirming the phenotype not causing it?)

I then realized that I should take a look at what we're doing now, and it looks like we currently include chemical conditions in the Alliance g2p ingest:

        if "conditionRelations" in row.keys() and row["conditionRelations"] is not None:
            qualifiers: List[str] = []
            for conditionRelation in row["conditionRelations"]:
                for condition in conditionRelation["conditions"]:
                    if condition["conditionClassId"]:
                        qualifier_term = condition["conditionClassId"]
                        qualifiers.append(qualifier_term)

            association.qualifiers = qualifiers

However, we aren't actually getting any g2p associations with qualifiers from the ingest right now. The example above is masked because it isn't using a phenotype term.

For ZFIN, we exclude any phenotypes with with non-standard environmental conditions by loading from https://zfin.org/downloads/phenoGeneCleanData_fish.txt, which includes genotypes with single affected genes and only standard or generic control environments.

My guess is that the alliance file has the same intent as the ZFIN file, to not include any chemicals that are driving phenotypic change.

I'm not sure I have any action to propose, so this issue is maybe just stating where we're at and asking the question of where we want to go.

cc: @monicacecilia @diatomsRcool

monicacecilia commented 4 months ago

@diatomsRcool- These data are being interpreted differently by two different sources. More details available here https://github.com/monarch-initiative/uphenotizer/issues/3 Do you have a recommendation on what we should do here? Thanks for your advice!

diatomsRcool commented 4 months ago

Looking at the first case Kevin discusses above, where it looks like a chemical is being used to confirm a phenotype, it seems that this assertion or publication or database record or whatever you want to call it should be listed as evidence supporting a g2p connection. The second case seems a bit trickier to me because its not a real example. It seems like we should be using ZECO (or whatever) for now and eventually ECTO when it gets updated- and include this term in the axiomatic definition possibly using 'characteristic of' as Nico suggested in the above referenced issue. The specific relationship used should be discussed.