intermine / pombemine

0 stars 1 forks source link

Allele description "NO VALUE" can be inferred for some columns #60

Open ValWood opened 2 years ago

ValWood commented 2 years ago

Allele description "NO VALUE" can be inferred for some columns, because the allele type and description are equivalent.

See

Screenshot 2022-05-31 at 12 43 12

Allele type "wild type" description = "wild type" Allele type "unknown" description = "unknown" Allele type "deletion" description = "deletion"

ValWood commented 2 years ago

Actually, can we export this in the PHAF file @kimrutherford (I think it will be useful for other consumers too, like Monarch). Description no value is not very useful, and nobody is going to know what to put in that field unless they are told.

kimrutherford commented 2 years ago

because the allele type and description are equivalent

I don't understand that bit.

nobody is going to know what to put in that field unless they are told.

I think that they shouldn't be putting anything in the description field for wild_type and deletions alleles. Describing an allele of type "wild_type" as "wild type" isn't informative. Or describing "deletion" as "deletion".

Users shouldn't be querying wild types or deletions using the description field. They should use the type field as it's more reliable.

(I'm not keen on "NO VALUE" in PombeMine either - I think blank would be clearer)

ValWood commented 2 years ago

I just wonder if a biologist would expect this to be the description. Deletion and wild-type seem to be a good proxy for "no sequence change" and somehow representing the deleted bases (which would be mostly impossible, but I guess could be required in the future )

@manulera any thoughts on this?

I agree that "blank" would be much clearer than "NO VALUE". Probably it is "NO VALUE" that bothers me.

manulera commented 2 years ago

It was in my todo list to make an issue about this exactly. I think the problem is the ambiguous use of expression levels and alleles. We discussed this briefly before. I think allele description should be only concerned with describing the DNA sequence, and expression level should be a different field since it is conditional.

We could make promoter_changed a type of allele, and we could indicate which promoter has been used when it is known. Also, a change in promoter alone may not lead to a change in expression (it may need to be induced or repressed), but typically if a phenotype has been described it is safe to assume that the experiment has been conducted at conditions where the expression levels change.

You can see what I mean with the following query, you see that all alleles that come up in wild_type are from those that come from changes in expression levels (see that the only values are knockdown and overexpression):

<query model="genomic" view="Gene.symbol Gene.primaryIdentifier Gene.alleles.symbol Gene.alleles.type Gene.alleles.description Gene.alleles.expression" constraintLogic="(nil and A)" sortOrder="Gene.primaryIdentifier ASC">
   <constraint path="Gene.alleles.type" value="wild_type" op="=" code="A"/>
</query>
manulera commented 2 years ago

And for deletion, I think it would be ok to say 'deletion' in the description.

ValWood commented 2 years ago

Will continue the discussion here https://github.com/pombase/canto/issues/2544 This will likely be a long term change but it isn't causing major problems for the querying.

manulera commented 2 years ago

We could still change the description field from wild_type to promoter_change.