Closed kimrutherford closed 2 weeks ago
(Currently it returns no results)
This (fixed) query returns three GO-CAM IDs provided_by PomBase. Progress!
PREFIX gocam: <http://model.geneontology.org/>
PREFIX provided_by: <http://purl.org/pav/providedBy>
SELECT distinct ?gocam WHERE {
GRAPH ?gocam {
?gocam provided_by: "http://www.pombase.org"^^<http://www.w3.org/2001/XMLSchema#string> .
}
}
ORDER BY ?gocam
The query results are: gocam:66187e4700001573 gocam:66187e4700001781 gocam:66187e4700002284
For comparison, we have these GO-CAMs manually configured:
66187e4700001573 66187e4700001781 66187e4700002284 66187e4700003150 662af8fa00000408 662af8fa00000499
We can query model IDs and gene IDs in GO-CAM models provided by PomBase with:
PREFIX gocam: <http://model.geneontology.org/>
PREFIX provided_by: <http://purl.org/pav/providedBy>
PREFIX obo: <http://purl.obolibrary.org/obo/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX pombasegeneid: <http://identifiers.org/pombase/>
SELECT distinct ?gocam ?geneid WHERE {
GRAPH ?gocam {
?gocam provided_by: "http://www.pombase.org"^^<http://www.w3.org/2001/XMLSchema#string> .
?modelgeneid rdf:type ?geneid
}
FILTER(strstarts(str(?geneid), str(pombasegeneid:)))
}
ORDER BY ?gocam ?geneid
Results look something like:
gocam:66187e4700001573 pombasegeneid:SPAC644.04 gocam:66187e4700001573 pombasegeneid:SPBC2F12.08c gocam:66187e4700001573 pombasegeneid:SPCC330.10 gocam:66187e4700001781 pombasegeneid:SPAC1B3.17 ...
I'm hoping we can wrap this in a script that gets run nightly or maybe weekly.
We can query model IDs and gene IDs in GO-CAM models provided by PomBase with:
I meant to say that this query returns every gene ID used anywhere in the model. I think that's probably what we want but the query could be made more precise if needed later (once I understand the GO-CAM model better).
Here's a slightly more precise query after re-reading the GO SPARQL docs:
PREFIX gocam: <http://model.geneontology.org/>
PREFIX provided_by: <http://purl.org/pav/providedBy>
PREFIX obo: <http://purl.obolibrary.org/obo/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX pombasegeneid: <http://identifiers.org/pombase/>
PREFIX enabled_by: <http://purl.obolibrary.org/obo/RO_0002333>
SELECT distinct ?gocam ?geneid WHERE {
GRAPH ?gocam {
?gocam provided_by: "http://www.pombase.org"^^<http://www.w3.org/2001/XMLSchema#string> .
?s enabled_by: ?gpnode .
?gpnode rdf:type ?geneid .
}
FILTER(strstarts(str(?geneid), str(pombasegeneid:)))
}
ORDER BY ?gocam ?geneid
I've added a script that makes a SPARQL query to get the GO-CAM IDs and corresponding gene IDs. The output is the same format as pombe-embl/supporting_files/production_gocam_id_mapping.tsv
As with the file generated for process terms (pombase/website#2173), there are only three models in the output of the script:
I'm assuming that more will be available after the next GO update.
I hope that eventually we'll be able automatically update production_gocam_id_mapping.tsv
nightly or weekly using the new script.
SPARQL is being deprecated by GO in favour of the GO API
It looks like we can use this API end-point to return gene products given a list of GO-CAM IDs: /api/models/gp
And this end-point should return a list of pombe models: /api/taxon/{taxon}/models
There's an issue at the moment though:
If necessary, we can get the pombe GO-CAM IDs and genes from: http://snapshot.geneontology.org/products/upstream_and_raw_data/noctua_pombase.gpad.gz
I've written a script that uses the Noctua PomBase GPAD file and the GO model API (https://api.geneontology.org/api/go-cam/{ID}
) to update the data files for Chado:
pombe-embl/supporting_files/production_gocam_id_mapping.tsv
- gene ID to GO-CAM ID mappingpombe-embl/supporting_files/production_gocam_term_id_mapping.tsv
- process term ID to GO-CAM ID mappingThe script is also retrieves the GO-CAM model titles from the API for adding to Chado.
I've run the script and committed the updated files to SVN. So on Tuesday morning we should have a bunch of extra models and the models will have titles.
For now I plan to run the script manually until I'm convinced that it's reliable.
I'm going to close this issue since we can now get what we need from GO.
From email, but I wanted to attach it to this issue:
I remembered the discussion with FlyBase yesterday about GO-CAMs. We talked about how the Alliance seem to have their own internal API for GO data. It's used by the GO-CAM widget. I had thought that the widget needed a Alliance special API to work but it occurred to me today that maybe the data from that API could be in the correct format for the GO-CAM update script we use. After trying a few things I was able to get that to work and we'll have much more up-to-date GO-CAM information for tonight's load. There will be 290 genes from this query tomorrow: https://www.pombase.org/results/from/id/f99d8133-3206-4941-b44e-9314e7cae3d2
There will be 290 genes from this query tomorrow: https://www.pombase.org/results/from/id/f99d8133-3206-4941-b44e-9314e7cae3d2
These are the 290 genes: https://www.pombase.org/results/from/id/690ab3eb-0db5-4698-82e4-ea006399c40a
Its odd that these arent in the list
SPBC3D6.07 | gpi3 | pig-A, phosphatidylinositol N-acetylglucosaminyltransferase subunit Gpi3 SPCC16A11.06c | gpi10 | pig-B SPAC13G6.03 | gpi7 | Pig-G, CP2 mannose-ethanolamine phosphotransferase GPI anchor biosynthesis protein Gpi7 SPBC27B12.06 | gpi13 | pig-O SPAC4G8.12c | smp3 | pig-Z, alpha-1,2-mannosyltransferase Smp3
they have been in the production model for quite a while...
SPBC3D6.07 gpi3 pig-A, phosphatidylinositol N-acetylglucosaminyltransferase subunit Gpi3 SPCC16A11.06c gpi10 pig-B SPAC13G6.03 gpi7 Pig-G, CP2 mannose-ethanolamine phosphotransferase GPI anchor biosynthesis protein Gpi7 SPBC27B12.06 gpi13 pig-O SPAC4G8.12c smp3 pig-Z, alpha-1,2-mannosyltransferase Smp3
they have been in the production model for quite a while...
Which model? I can look it up if you let me know the ID.
similarly
SPAC3A11.08 | cul4 | CLRC complex subunit, cullin 4 |
---|---|---|
SPCC970.07c | raf2 | CLRC ubiquitin ligase complex subunit Raf2 |
SPCC613.12c | raf1 | CLRC ubiquitin ligase complex WD repeat subunit Raf1/Dos1 |
SPCC11E10.08 | rik1 | CLRC ubiquitin ligase complex WD repeat subunit Rik1 |
SPAC3A11.08 cul4 CLRC complex subunit, cullin 4 SPCC970.07c raf2 CLRC ubiquitin ligase complex subunit Raf2 SPCC613.12c raf1 CLRC ubiquitin ligase complex WD repeat subunit Raf1/Dos1 SPCC11E10.08 rik1 CLRC ubiquitin ligase complex WD repeat subunit Rik1
are in http://noctua.geneontology.org/workbench/noctua-visual-pathway-editor/?model_id=gomodel%3A66187e4700001781 http://noctua.geneontology.org/workbench/noctua-visual-pathway-editor/?model_id=gomodel%3A665912ed00001983 http://noctua.geneontology.org/workbench/noctua-visual-pathway-editor/?model_id=gomodel%3A665912ed00000652
OK, thanks.
I'm now testing an even more dodgy hack to get all the details of all the production models. We can talk about what I've done next time we have a chat. It's quite a fragile solution that queries the Noctua server (which Seth recommended / requested that we don't do) and the Alliance GO-CAM API (which seems like a temporary hack on their part). So I don't know how long it will work for us. But it works for now and it seems quite up to date.
With the new hack we get 354 genes in models and all the missing genes from your comment are present. Here's the list on my desktop: https://desktop.kmr.nz/results/from/id/89524fcb-3b06-4943-ae78-6f748e141108
(Lots of data is missing from that version as it was a quick load to test the GO-CAMs)
With the new hack we get 354 genes in models
I've committed those changes into SVN so they'll be on pombase.org tomorrow.
excellent, lets use that for the time being. It's really taking shape, we have done about 10% of likely possible in 3 months!
they'll be on pombase.org tomorrow
That worked: https://www.pombase.org/results/from/id/f99d8133-3206-4941-b44e-9314e7cae3d2
Let me know if you notice any missing genes. (Or genes that shouldn't be there)
Perfect, the progress is pretty amazing because we have quite a lot in development and things to add to existing pathways.
We want to be able query for pombe GO-CAMs rather than having to manually curate files in SVN (like
pombe-embl/supporting_files/production_gocam_id_mapping.tsv
)See also:
SPARQL query for use here: https://geneontology.org/sparql:
(Currently it returns no results)