oxinabox / DataDepsGenerators.jl

Utility for developers to help define DataDeps registration blocks, for reusing existing Data with DataDeps.jl
Other
18 stars 6 forks source link

Add DOI common API #42

Closed SebastinSanty closed 6 years ago

SebastinSanty commented 6 years ago

This is a rudimentary version of what I think can fit in?

oxinabox commented 6 years ago

We don't need to on dispatch down to other generators. The information we need is in the RDF+XML. Barring download link. But wi'll accept that that isn't always possibly for a DOI based solution.

SebastinSanty commented 6 years ago

Right, but 1) The RDF+XML doesn't contain all/most of the information we want like download urls, etc.. 2) All the three XML structures are different. It would be a pain to write a common code for them?

oxinabox commented 6 years ago

All the three XML structures are different. It would be a pain to write a common code for them?

They should not be different. The whole point of RDF is that it should be a common format. Either we need a normalizing RDF parser (idk maybe RDF has the ability to alias namespaces?) Or it is a failure of the upstream providers.

The RDF+XML doesn't contain all/most of the information we want like download urls, etc..

It contains everything we want except download urls. And in general said download URLs may not even exist, since DOIs can be issued to things that do not exist online. But for that reason this is lower priority than say JSON-LD which (normally) does include download URLs.

SebastinSanty commented 6 years ago

I think JSON-LD is only for DataCite, doesn't work on others. https://crosscite.org/docs.html

SebastinSanty commented 6 years ago

Also here are the sample cases:

mEDRA:

<rdf:RDF
    xmlns:dc="http://purl.org/dc/terms/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:prism="http://prismstandard.org/namespaces/basic/2.1/"
    xmlns:owl="http://www.w3.org/2002/07/owl#"
    xmlns:bibo="http://purl.org/ontology/bibo/"
    xmlns:foaf="http://xmlns.com/foaf/0.1/">
    <bibo:Article rdf:about="https://doi.org/10.3290/j.ohpd.a8435">
        <dc:isPartOf>
            <bibo:Journal rdf:about="http://id.medra.org/issn/1602-1622">
                <dc:identifier>1602-1622</dc:identifier>
                <owl:sameAs>urn:issn:1602-1622</owl:sameAs>
                <bibo:issn>1602-1622</bibo:issn>
                <prism:issn>1602-1622</prism:issn>
                <dc:title>Oral Health &amp; Preventive Dentistry</dc:title>
                <dc:hasPart rdf:resource="https://doi.org/10.3290/j.ohpd.a8435"/>
            </bibo:Journal>
        </dc:isPartOf>
        <prism:doi>10.3290/j.ohpd.a8435</prism:doi>
        <dc:identifier>10.3290/j.ohpd.a8435</dc:identifier>
        <prism:startingPage>141</prism:startingPage>
        <dc:isPartOf>
            <bibo:Journal rdf:about="http://id.medra.org/issn/1757-9996">
                <dc:identifier>1757-9996</dc:identifier>
                <owl:sameAs>urn:issn:1757-9996</owl:sameAs>
                <prism:eIssn>1757-9996</prism:eIssn>
                <bibo:eissn>1757-9996</bibo:eissn>
                <dc:title>Oral Health &amp; Preventive Dentistry</dc:title>
                <dc:hasPart rdf:resource="https://doi.org/10.3290/j.ohpd.a8435"/>
            </bibo:Journal>
        </dc:isPartOf>
        <dc:publisher>Quintessence Publishing Co. Ltd.</dc:publisher>
        <owl:sameAs>info:doi/10.3290/j.ohpd.a8435</owl:sameAs>
        <bibo:volume>1</bibo:volume>
        <bibo:doi>10.3290/j.ohpd.a8435</bibo:doi>
        <dc:date>2003-07-02</dc:date>
        <bibo:pageEnd>148</bibo:pageEnd>
        <prism:endingPage>148</prism:endingPage>
        <dc:title>High-fluoride Drinking Water. A Health Problem in the Ethiopian Rift Valley1. Assessment of Lateritic Soils as Defluoridating Agents</dc:title>
        <bibo:pageStart>141</bibo:pageStart>
        <owl:sameAs>doi:10.3290/j.ohpd.a8435</owl:sameAs>
        <prism:volume>1</prism:volume>
    </bibo:Article>
</rdf:RDF>

DataCite:

<?xml version='1.0' encoding='utf-8' ?>
<rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#' xmlns:schema='http://schema.org/'>
    <schema:ScholarlyArticle rdf:about='https://doi.org/10.5281/zenodo.1147572'>
        <schema:alternateName>https://zenodo.org/record/1147572</schema:alternateName>
        <schema:author>
            <schema:Person rdf:nodeID='b0'>
                <schema:familyName>Arora</schema:familyName>
                <schema:givenName>Rama</schema:givenName>
                <schema:name>Rama Arora</schema:name>
            </schema:Person>
        </schema:author>
        <schema:datePublished rdf:datatype='http://schema.org/Date'>2018-01-15</schema:datePublished>
        <schema:description>The objective of this work is to study the surface roughness of LM13alloy composites with rutile particles .The fabrication route adopted for preparing the samples containing variable ratio of rutile reinforcement is simple vortex technique. The wear tests were carried out under different loading conditions from 9.8N to 49N. The pin specimen travelled a distance of 3000m at constant sliding speed on the hard steel disc. The addition of fine size rutile particles results in higher hardness and strength. The stress concentration at the voids due to weak interfaces leads to crack intitation, arising from the particle fracture .This can be avoided by providing more strength to the matrix which is achieved by introducing hard ceramic rutile particulates. As the soft matrix aluminium alloy is prone to scratches and indentation during the contact sliding conditions, study of surface roughness of composite after wear studies need significant attention.</schema:description>
        <schema:identifier rdf:resource='https://doi.org/10.5281/zenodo.1147572' />
        <schema:isPartOf>
            <schema:Periodical rdf:nodeID='b1'></schema:Periodical>
        </schema:isPartOf>
        <schema:keywords>Wear, Reinforcement, Rutile , SEM</schema:keywords>
        <schema:license rdf:resource='https://creativecommons.org/licenses/by/4.0' />
        <schema:name>Study Of Surface Roughness With The Variation In Applied Load Of Rutile Ceramic Reinforced Aluminium Composite</schema:name>
        <schema:publisher>
            <schema:Organization rdf:nodeID='b2'>
                <schema:name>Zenodo</schema:name>
            </schema:Organization>
        </schema:publisher>
        <schema:schemaVersion rdf:resource='http://datacite.org/schema/kernel-4' />
    </schema:ScholarlyArticle>
</rdf:RDF>

Crossref:

<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:j.0="http://purl.org/dc/terms/"
    xmlns:j.1="http://prismstandard.org/namespaces/basic/2.1/"
    xmlns:owl="http://www.w3.org/2002/07/owl#"
    xmlns:j.2="http://purl.org/ontology/bibo/"
    xmlns:j.3="http://xmlns.com/foaf/0.1/">
    <rdf:Description rdf:about="http://dx.doi.org/10.1126/science.169.3946.635">
        <j.1:startingPage>635</j.1:startingPage>
        <owl:sameAs rdf:resource="doi:10.1126/science.169.3946.635"/>
        <owl:sameAs rdf:resource="info:doi/10.1126/science.169.3946.635"/>
        <j.0:identifier>10.1126/science.169.3946.635</j.0:identifier>
        <j.0:publisher>American Association for the Advancement of Science (AAAS)</j.0:publisher>
        <j.0:creator>
            <j.3:Person rdf:about="http://id.crossref.org/contributor/h-s-frank-3new7r2ulpnaj">
                <j.3:name>H. S. Frank</j.3:name>
                <j.3:familyName>Frank</j.3:familyName>
                <j.3:givenName>H. S.</j.3:givenName>
            </j.3:Person>
        </j.0:creator>
        <j.1:doi>10.1126/science.169.3946.635</j.1:doi>
        <j.2:pageEnd>641</j.2:pageEnd>
        <j.2:doi>10.1126/science.169.3946.635</j.2:doi>
        <j.2:volume>169</j.2:volume>
        <j.0:isPartOf>
            <j.2:Journal rdf:about="http://id.crossref.org/issn/0036-8075">
                <j.1:issn>1095-9203</j.1:issn>
                <j.2:issn>1095-9203</j.2:issn>
                <owl:sameAs>urn:issn:1095-9203</owl:sameAs>
                <owl:sameAs>urn:issn:0036-8075</owl:sameAs>
                <j.0:title>Science</j.0:title>
                <j.1:issn>0036-8075</j.1:issn>
                <j.2:issn>0036-8075</j.2:issn>
            </j.2:Journal>
        </j.0:isPartOf>
        <j.0:title>The Structure of Ordinary Water: New data and interpretations are yielding new insights into this fascinating substance</j.0:title>
        <j.1:endingPage>641</j.1:endingPage>
        <j.0:date rdf:datatype="http://www.w3.org/2001/XMLSchema#date"
    >1970-08-14</j.0:date>
        <j.1:volume>169</j.1:volume>
        <j.2:pageStart>635</j.2:pageStart>
    </rdf:Description>
</rdf:RDF>
oxinabox commented 6 years ago

Ok, what is going on here is that mEDRA and CrossRef are using the same namespaces, but with different keys. e.g. mEDRA's dc: is CrossRef's j.0, both are just names for http://purl.org/dc/terms/ Similarly foaf is j.3 is http://xmlns.com/foaf/0.1/" So to deal with that we just need to actually normalized the namespaces. Or pragmatically: we can probably just decard the namespace parts of names.

Except that mEDRA is missing a bunch of fields we care about.

DataCite on the other hand is using a actual different names space (Schema.org). Which is just really disappointing. Pragamatically we could again just strip the namespace part and normalize case, and probably get a lot. But we have other ways of doing DataCite anyway.

This is pretty disappointing. Lets back-burner this one, and move on to JSON-LD. Which while theoretically can have the same problem of people using different namespaces with entirely different semantics, in practice doesn't seem to.

oxinabox commented 6 years ago

I think JSON-LD is only for Data items, doesn't work on others. Isn't it? https://crosscite.org/docs.html

That is if you content negotiate for it. But what #30 is talking about is looking for it in the HTML of the webpage. See more over there.

oxinabox commented 6 years ago

Can we close this since we are getting the main providers via JSON-LD content negotiation?

SebastinSanty commented 6 years ago

Sure, makes sense.