Open ValWood opened 5 years ago
Ideally we would automatically create the IGI when the multi-allele phenotype is created in some instances.....
We should be able to do that if you can pin down when it should happen.
It sounds like this needs a bit of thought and discussion. Does this overlap with the genotype-genotype interaction changes?
Val, I'm going to assign this to both of us. And add "discuss".
This is the query we would need.
~All multi-gene phenotypes (only double mutants, haploid), where none of the contributing genotypes are overexpression, annotated to "inviable vegetative cell population" (FYPO:0002061)~
All multi-gene phenotypes (only haploid), annotated to "inviable vegetative cell population" (FYPO:0002061) where none of the contributing genotypes are overexpression.
subtract any of these where the single locus is "inviable vegetative cell population" (FYPO:0002061) (these are not interacting genetically). (ignore this part for now, it shouldn't happen- we can check for this later)
This list should be fewer than 300, I think. Then, for these pairs of genotypes, check if the genes have synthetic lethal genetic interaction and let me know which ones do not (hopefully it won't be too many)
The ones which don't we can easily add to the multi allele phenotype.
where none of the contributing genotypes are overexpression, annotated to "inviable vegetative cell population" (FYPO:0002061) subtract any of these where the single locus is "inviable vegetative cell population" (FYPO:0002061)
You repeated "inviable vegetative cell population" here. Did you mean to?
I will put at medium priority, because I won't have time to do the fixes immediately. But if "quick" you can do sooner.
I'm having a go now.
This list should be fewer than 300, I think.
Is there a constraint missing from your description? I can't guarantee that it's correct, but my query gives ~3000 matching genotypes.
This session on its own has 48 examples:
https://curation.pombase.org/pombe/curs/02289de0acddb1f0/genotype_manage/ro
Do all the genotypes for the single alleles need to have a phenotype annotated?
Current query:
WITH single_allele_genotypes AS
(SELECT genotype.feature_id
FROM feature genotype
WHERE
(SELECT count(*)
FROM feature_relationship rel
JOIN cvterm rel_type ON rel.type_id = rel_type.cvterm_id
JOIN feature allele ON allele.feature_id = rel.subject_id
JOIN cvterm allele_type ON allele_type.cvterm_id = allele.type_id
WHERE rel.object_id = genotype.feature_id
AND rel_type.name = 'part_of'
AND allele_type.name = 'allele'
AND rel.feature_relationship_id IN
(SELECT feature_relationship_id
FROM feature_relationshipprop p
JOIN cvterm pt ON p.type_id = pt.cvterm_id
WHERE pt.name = 'expression' ) ) = 1 ),
viable_single_allele_genotypes AS
(SELECT genotype.feature_id
FROM feature genotype
JOIN cvterm genotype_type ON genotype.type_id = genotype_type.cvterm_id
JOIN feature_cvterm fc ON fc.feature_id = genotype.feature_id
JOIN feature_relationship rel ON rel.object_id = genotype.feature_id
JOIN cvterm rel_type ON rel.type_id = rel_type.cvterm_id
JOIN feature allele ON allele.feature_id = rel.subject_id
JOIN cvterm allele_type ON allele_type.cvterm_id = allele.type_id
WHERE genotype_type.name = 'genotype'
AND rel_type.name = 'part_of'
AND allele_type.name = 'allele'
AND rel.feature_relationship_id NOT IN
(SELECT feature_relationship_id
FROM feature_relationshipprop p
JOIN cvterm pt ON p.type_id = pt.cvterm_id
WHERE pt.name = 'expression' )
AND fc.cvterm_id NOT IN
(SELECT cvterm_id
FROM cvterm
WHERE name = 'inviable vegetative cell population')
AND fc.cvterm_id NOT IN
(SELECT subject_id
FROM cvtermpath
WHERE object_id in
(SELECT cvterm_id
FROM cvterm
WHERE name = 'inviable vegetative cell population'))
AND genotype.feature_id in
(SELECT *
FROM single_allele_genotypes) ),
overexpression_genotype_ids AS
(SELECT object_id AS feature_id
FROM feature_relationship g_rel
JOIN cvterm g_rel_type ON g_rel_type.cvterm_id = g_rel.type_id
JOIN feature allele ON allele.feature_id = g_rel.subject_id
JOIN cvterm allele_type ON allele_type.cvterm_id = allele.type_id
WHERE g_rel_type.name = 'part_of'
AND allele_type.name = 'allele'
AND g_rel.feature_relationship_id IN
(SELECT feature_relationship_id
FROM feature_relationshipprop p
JOIN cvterm pt ON p.type_id = pt.cvterm_id
WHERE pt.name = 'expression' )
AND allele.feature_id in
(SELECT subject_id
FROM feature_relationship
WHERE object_id in
(SELECT feature_id
FROM single_allele_genotypes) ) )
SELECT uniquename
FROM feature genotype
JOIN cvterm genotype_type ON genotype.type_id = genotype_type.cvterm_id
WHERE genotype_type.name = 'genotype'
AND
(SELECT count(*)
FROM feature_relationship rel
JOIN cvterm rel_type ON rel.type_id = rel_type.cvterm_id
JOIN feature allele ON allele.feature_id = rel.subject_id
JOIN cvterm allele_type ON allele_type.cvterm_id = allele.type_id
WHERE rel.object_id = genotype.feature_id
AND rel_type.name = 'part_of'
AND allele_type.name = 'allele') = 2
AND genotype.feature_id not in
(SELECT feature_id
FROM overexpression_genotype_ids)
ORDER by uniquename;
is this only "inviable vegetative cell population" (FYPO:0002061) and also in the first instance let's do only double deletions to make it cleaner. We can extend to other types once we nail this one.
Maybe it is 3000 annotation, which is OK. The question is, how many of these do not have
the genetic interaction "synthetic lethal" also recorded.
ACtually let's do all allele types except if one (or both if that exists) are overexpression.
It probably is 3000, because the query gave me >500 genes, and many will have multiple annotations of this type
Actually we should just do ONLY deletions for now.
For other allele types we can maybe be cleverer and map them better to the new stye GIs....
So the immediate question is how many double deletes with FYPO "inviable vegetative cell population " don't have the corresponding BioGRID GI genetic interaction synthetic lethal?
is this only "inviable vegetative cell population" (FYPO:0002061)
Maybe I misunderstand this?:
All multi-gene phenotypes (only double mutants, haploid), where none of the contributing genotypes are overexpression, annotated to "inviable vegetative cell population"
I read it as meaning none of the contributing genotypes are overexpression and none are annotated to "inviable vegetative cell population".
how many double deletes with FYPO "inviable vegetative cell population " don't have the corresponding BioGRID GI genetic interaction synthetic lethal?
Is it the single allele annotation or the double mutant that should be annotated to "inviable vegetative cell population"?
I edited the first comment above. To begin lets look at
All multi-gene phenotypes (only haploid, only deletion), annotated to "inviable vegetative cell population" (FYPO:0002061) where none of the contributing genotypes are overexpression.
which do not have a synthethetic lethal annotating for the same pair of genes.
Once we have this correct we can extend to do more stuff.
i.e ignore the part of singe allele for now, it will be interesting to look but in these cases I don't think we should ever see a case where the single allele is also inviable. I might be wrong...
i.e ignore the part of singe allele for now,
So ignore this too: "where none of the contributing genotypes are overexpression."? I'll still a bit confused.
At the moment we are only doing double deletes, so non of these will be overexpression.
double deletes with FYPO "inviable vegetative cell population "
After updating the query, there are 176 of those.
WITH deletion_alleles AS
(SELECT allele.feature_id
FROM feature allele
JOIN cvterm t ON allele.type_id = t.cvterm_id
JOIN featureprop p ON p.feature_id = allele.feature_id
JOIN cvterm pt ON pt.cvterm_id = p.type_id
WHERE t.name = 'allele'
AND pt.name = 'allele_type'
AND p.value = 'deletion' )
SELECT *
FROM feature genotype
JOIN cvterm genotype_type ON genotype.type_id = genotype_type.cvterm_id
JOIN pombase_feature_cvterm_ext_resolved_terms fc ON fc.feature_id = genotype.feature_id
WHERE genotype_type.name = 'genotype'
AND fc.cvterm_id IN
(SELECT cvterm_id
FROM cvterm
WHERE name = 'inviable vegetative cell population')
AND
(SELECT count(*)
FROM feature_relationship rel
JOIN cvterm rel_type ON rel.type_id = rel_type.cvterm_id
JOIN feature allele ON allele.feature_id = rel.subject_id
JOIN cvterm allele_type ON allele_type.cvterm_id = allele.type_id
WHERE rel.object_id = genotype.feature_id
AND rel_type.name = 'part_of'
AND allele_type.name = 'allele' ) = 2
AND
(SELECT count(*)
FROM feature_relationship rel
JOIN cvterm rel_type ON rel.type_id = rel_type.cvterm_id
JOIN feature allele ON allele.feature_id = rel.subject_id
JOIN cvterm allele_type ON allele_type.cvterm_id = allele.type_id
WHERE rel.object_id = genotype.feature_id
AND rel_type.name = 'part_of'
AND allele_type.name = 'allele'
AND allele.feature_id in
(SELECT feature_id
FROM deletion_alleles) ) = 2
ORDER BY uniquename;
After updating the query, there are 176 of those.
Of those 176, only 10 have an interaction of any sort.
These are the 166 that don't:
WITH deletion_alleles AS
(SELECT allele.feature_id
FROM feature allele
JOIN cvterm t ON allele.type_id = t.cvterm_id
JOIN featureprop p ON p.feature_id = allele.feature_id
JOIN cvterm pt ON pt.cvterm_id = p.type_id
WHERE t.name = 'allele'
AND pt.name = 'allele_type'
AND p.value = 'deletion' ),
double_mutant_with_interaction AS
(select subject_id from feature interaction join feature_relationship rel on rel.object_id = interaction.feature_id join cvterm rel_type on rel_type.cvterm_id = rel.type_id join cvterm int_type on int_type.cvterm_id = interaction.type_id where int_type.name = 'genotype_interaction' and rel_type.name = 'interaction_double_mutant_genotype')
SELECT genotype.uniquename, string_agg(distinct allele.name, ', ')
FROM feature genotype
join feature_relationship rel ON rel.object_id = genotype.feature_id JOIN feature allele on rel.subject_id = allele.feature_id
JOIN cvterm genotype_type ON genotype.type_id = genotype_type.cvterm_id
JOIN pombase_feature_cvterm_ext_resolved_terms fc ON fc.feature_id = genotype.feature_id
WHERE genotype_type.name = 'genotype'
AND genotype.feature_id NOT in (select * from double_mutant_with_interaction)
AND fc.cvterm_id IN
(SELECT cvterm_id
FROM cvterm
WHERE name = 'inviable vegetative cell population')
AND
(SELECT count(*)
FROM feature_relationship rel
JOIN cvterm rel_type ON rel.type_id = rel_type.cvterm_id
JOIN feature allele ON allele.feature_id = rel.subject_id
JOIN cvterm allele_type ON allele_type.cvterm_id = allele.type_id
WHERE rel.object_id = genotype.feature_id
AND rel_type.name = 'part_of'
AND allele_type.name = 'allele' ) = 2
AND
(SELECT count(*)
FROM feature_relationship rel
JOIN cvterm rel_type ON rel.type_id = rel_type.cvterm_id
JOIN feature allele ON allele.feature_id = rel.subject_id
JOIN cvterm allele_type ON allele_type.cvterm_id = allele.type_id
WHERE rel.object_id = genotype.feature_id
AND rel_type.name = 'part_of'
AND allele_type.name = 'allele'
AND allele.feature_id in
(SELECT feature_id
FROM deletion_alleles) ) = 2
GROUP BY genotype.uniquename
ORDER BY uniquename;
re question Do all the genotypes for the single alleles need to have a phenotype annotated?
All of the deletion allees should have a "population viability" phenotype. were there any without one?
We can talk about the next step when we chat
Action, find a session with a few examples and fix 1 to the new form as an example
Action, find a session with a few examples and fix 1 to the new form as an example
These two look good for that:
https://curation.pombase.org/pombe/curs/613296d2e91f5cb0/ro/ https://curation.pombase.org/pombe/curs/ee41f2de8b73ee9b/ro/
All of the deletion allees should have a "population viability" phenotype.
Is that "cell population viability" FYPO:0002057?
There are 179 "inviable vegetative cell population" double deletion 10 have interactions already.
All of the deletion allees should have a "population viability" phenotype. were there any without one?
There seem to be a lot. Of the 169 double deletions without an existing interactions, only 25 have a "population viability" phenotype on both the single alleles.
This sessions has examples of single alleles that don't have a "population viability" phenotype: https://curation.pombase.org/pombe/curs/006084139334e409/ro/
Hi @ValWood
I have a script ready to go to add the inferred genetic interactions for this case.
I think it will be quicker to implement the solution for future cases as I had a few false starts.
When you're back from holiday we can run it on the test Canto to confirm that I understand what's needed and that the script is doing the right thing.
Yes sorry. I was thinking that all should have a population viability phenotype, but not from these papers (i.e from the deletion project) But we don't make the presence of the single mutant necessary to make the double mutant inviable (synthetic lethal) interaction.
As discussed on Zoom, the plan is:
I plan to do 1 and 2 this weekend and then 4 when it's convenient.
- copy the main Canto database over to the test Canto
- need 5 minutes downtime
- run the script to create the inferred interactions on the test Canto
I've done that now but I immediately noticed a bug: sometimes duplicate interactions are created. For example in: https://curation.pombase.org/test/curs/60100461d789b19f/ro I'll fix that bug and try again on Monday.
sometimes duplicate interactions are created.
This case happens because there are two annotations that differ only in the conditions. Should be an easy fix.
sometimes duplicate interactions are created.
I've fixed that and I've run the script on the test Canto server.
only 25 have a "population viability" phenotype on both the single alleles.
Down to 18 after removing duplicates. Might have been quicker to do manually. :-)
Here are the sessions in the test Canto containing the inferred interactions:
Val to check
I'm a bit confused so we should chat about this on the next call.
I think this is what we need to do:
From the links I looked at I'm not sure, but its difficult to know which pairs I am looking at. Can we report the gene pair concerned along with the link? By looking through the list of genes with advanced query result "inviable population" for a multi-gene phenotype, and viable vegetative population, I found one example where we have a double mutant phenotype "inviable vegetative population" but did'nt have a GI: efc25-delta tea1 delta so we can use that in the testing (should be in the output of 1+2) . I will see if I can find more.
Note that this ticket is relative low priority compared to lots of others but we will keep chipping away at it on the calls.
Background:
Q from JAcky: Are all genetic interactions between 2 genes in the Genetic Interaction section of a gene page or just high through put data. Tiffany asked me if she could find out if a gene of interest was synthetically lethal with any other gene. I thought it would be in Genetic Interactions if it had been curated but wasn’t sure
A. It should be, but I don't know if they are (I'm sure that I haven't done a ~IGI~ GI for every synthetic lethal double mutant phenotype.....I would like this to be 'automatically generated') Tiffany should also look in the multi-allele phenotypes section to see if there are any "inviable vegetative cell population" to be sure....
We could run checks on this... Ideally we would automatically create the ~I~ GI when the multi-allele phenotype is created in some instances.....