Open stain opened 9 years ago
RDF syntax issue in 20151009.
Tested with riot --validate
from Apache Jena 3.0.0
MESH/ISSUES_MESH20151009.ttl.gz
ERROR riot :: [line: 161, col: 47] Bad character in IRI (space): http://purl.bioontology.org/ontology/MSH/...[space]...
<http://purl.bioontology.org/ontology/MSH/... (((4-(1,4,5,6R-trans-tetrahydro-2- pyrimidinyl)phenyl)acetyl)amino)-5-thia-> cheminf:CHEMINF_000560 "Contains completely undefined stereo:
enantiomers"@en .
MESH/LINKSET_EXACT_MESH20151009.ttl.gz
ERROR riot :: [line: 125, col: 95] Bad character in IRI (space): http://purl.bioontology.org/ontology/MSH/...[space]...
Line 125:
<http://ops.rsc.org/OPS1965918> skos:exactMatch <http://purl.bioontology.org/ontology/MSH/... th 3-(aminocarbonyl)-1-beta-D-ribofuranosylpyridinium hydroxide inner saltN-> .
URI mismatch in void:inDataset
statements - date changed during data generation?
stain@biggie:~/Downloads/rsc/20151009/HUMAN_METABOLOME_DATABASE$ riot * |grep inData | cut -d ">" -f 3
<http://ops.rsc.org/download/20151010/void_2015-10-10.ttl#human_metabolome_database_parent_child_charge_insensitive_parent_closeMatch
<http://ops.rsc.org/download/20151010/void_2015-10-10.ttl#human_metabolome_database_parent_child_isotope_insensitive_parent_closeMatch
<http://ops.rsc.org/download/20151010/void_2015-10-10.ttl#human_metabolome_database_parent_child_stereo_insensitive_parent_closeMatch
<http://ops.rsc.org/download/20151010/void_2015-10-10.ttl#human_metabolome_database_parent_child_super_insensitive_parent_closeMatch
<http://ops.rsc.org/download/20151010/void_2015-10-10.ttl#human_metabolome_database_parent_child_tautomer_insensitive_parent_closeMatch
<http://ops.rsc.org/download/20151009/void_2015-10-09.ttl#human_metabolome_database_exactMatch
<http://ops.rsc.org/download/20151010/void_2015-10-10.ttl#human_metabolome_database_ops_chemspider_exactMatch
<http://ops.rsc.org/download/20151010/void_2015-10-10.ttl#human_metabolome_database_parent_child_fragment_relatedMatch
<http://ops.rsc.org/download/20151009/void_2015-10-09.ttl#openphacts-human_metabolome_database
<http://ops.rsc.org/download/20151009/void_2015-10-09.ttl#openphacts-human_metabolome_database
while the void says consistently 20151009
or 2015-10-09
:
<http://ops.rsc.org/download/20151009/void_2015-10-09.ttl#human_metabolome_database_exactMatch> <http://rdfs.org/ns/void#linkPredicate> <http://www.w3.org/2004/02/skos/core#exactMatch> .
<http://ops.rsc.org/download/20151009/void_2015-10-09.ttl#human_metabolome_database_ops_chemspider_exactMatch> <http://rdfs.org/ns/void#linkPredicate> <http://www.w3.org/2004/02/skos/core#exactMatch> .
<http://ops.rsc.org/download/20151009/void_2015-10-09.ttl#human_metabolome_database_parent_child_charge_insensitive_parent_closeMatch> <http://rdfs.org/ns/void#linkPredicate> <http://www.w3.org/2004/02/skos/core#closeMatch> .
<http://ops.rsc.org/download/20151009/void_2015-10-09.ttl#human_metabolome_database_parent_child_fragment_relatedMatch> <http://rdfs.org/ns/void#linkPredicate> <http://ops.rsc.org/download/20151009/void_2015-10-09.ttl#human_metabolome_database_parent_child_isotope_insensitive_parent_closeMatch> <http://rdfs.org/ns/void#linkPredicate> <http://www.w3.org/2004/02/skos/core#closeMatch> .
<http://ops.rsc.org/download/20151009/void_2015-10-09.ttl#human_metabolome_database_parent_child_stereo_insensitive_parent_closeMatch> <http://rdfs.org/ns/void#linkPredicate> <http://www.w3.org/2004/02/skos/core#closeMatch> .
<http://ops.rsc.org/download/20151009/void_2015-10-09.ttl#human_metabolome_database_parent_child_super_insensitive_parent_closeMatch> <http://rdfs.org/ns/void#linkPredicate> <http://www.w3.org/2004/02/skos/core#closeMatch> .
<http://ops.rsc.org/download/20151009/void_2015-10-09.ttl#human_metabolome_database_parent_child_tautomer_insensitive_parent_closeMatch> <http://rdfs.org/ns/void#linkPredicate> <http://www.w3.org/2004/02/skos/core#closeMatch> .
(Note that void_2015-10-09.ttl
here is correct as it is not a .ttl.gz
)
The VoID has wrong dataDump directory for HMDB and MESH, as they are missing the subfolder names.
:openphacts-human_metabolome_database dcterms:description "The subset of OpenPhacts that contains Human Metabolome Database data."@en;
dcterms:title "OpenPhacts Human Metabolome Database Subset"@en;
void:dataDump <http://ops.rsc.org/download/20151009/ISSUES_HUMAN_METABOLOME_DATABASE20151009.ttl.gz>,
<http://ops.rsc.org/download/20151009/PROPERTIES_HUMAN_METABOLOME_DATABASE20151009.ttl.gz>,
<http://ops.rsc.org/download/20151009/SYNONYMS_HUMAN_METABOLOME_DATABASE20151009.ttl.gz>;
:openphacts-mesh dcterms:description "The subset of OpenPhacts that contains MeSH data."@en;
dcterms:title "OpenPhacts MeSH Subset"@en;
void:dataDump <http://ops.rsc.org/download/20151009/ISSUES_MESH20151009.ttl.gz>,
<http://ops.rsc.org/download/20151009/PROPERTIES_MESH20151009.ttl.gz>,
<http://ops.rsc.org/download/20151009/SYNONYMS_MESH20151009.ttl.gz>;
The pav:previousVersion
statements in the void points misleadingly to the same version:
:chebi_exactMatch pav:previousVersion :chebi_exactMatch .
:drugbank_exactMatch pav:previousVersion :drugbank_exactMatch .
Should these go to anchors within the previous VoID release under ftp://ops@ftp.rsc-us.org/OPS/ somewhere?
Update for RDF-2015.11.04.zip from http://ops.rsc.org/download/RDF-2015.11.04.zip (2.2 GiB, 20 GB unzipped):
I made a Maven job to archive and patch (still building, download speed from http://ops.rsc.org/ are not ideal, seems to be about 5 MBit/s?). Once archived I can use http://repository.mygrid.org.uk/artifactory/ops/org/openphacts/data/ops-rsc-dataset/20151104-SNAPSHOT/ instead, so not a big issue.
MESH errors remain - but the rest of the linksets are all valid Turtle. I added patches to remove the offending lines - this means those URIs won't have a matching links to MESH identifiers.
The void:dataDump
links are now updated, but now all of them are 404, e.g.
Simply unpacking the zip file in its current download directory should fix this, which would make http://ops.rsc.org/download/20151104/ work.
I see files now are .ttl
instead of .ttl.gz
which increases disk space requirement for unzipping by a ten-fold, but I can repackage them in the archival job.
Available at http://ops.rsc.org/download/RDF-2015.10.09.zip or as separate resources from http://ops.rsc.org/download/20151009/void_2015-10-09.ttl by following
void:dataDump
.TODO: Modify this build job https://github.com/openphacts/ops-rsc-wikipathways-dataset/ -- not sure if this should be one big
ops-rsc-dataset
, or probably better, one per linkset.Now easier to download from http://ops.rsc.org/download/ without authentication needed.