RDF download - Githubissues

sfarnel commented 4 years ago

Allow users to download

RDF and/or JSON of individual items
RDF and/or JSON of the entire dataset

johnhuck commented 4 years ago

This would be great. Or a set of results? The average user will want a citation export, I imagine, but that's another kettle of fish and this is a Linked Data project, so RDF export is appropriate. JSON-LD, perhaps?

jchartrand commented 3 years ago

@danydvd

Hi Danoosh, what do you recommend for retrieving the RDF/json?

danydvd commented 3 years ago

@jchartrand in the past version I was using the item_id (id in SOLR) to create a CONSTRUCT query to triplestore and then save the file based on the selected format (e.g. JSON-LD, XML). Here is the function if it helps.

For downloading the entire dataset this might work too. However, it might be impractical (take too long). If you are agreeable, let's try to formulize the item level and then we can check back for the entire dataset.

jchartrand commented 3 years ago

Thanks @danydvd - is there a RESTful endpoint for the SPARQL queries?

In other words, how would I issue a SPARQL query from the can-link front end (from the web browser)? Or would we do it some other way?

danydvd commented 3 years ago

@jchartrand GraphDB has a RDF4J API. You can access it via the workbench here.

Here is sample for getting (http://canlink.library.ualberta.ca/subject/bcb3603403b3c89cefd49d1e3c75e4c6):

curl -X GET --header 'Accept: application/rdf+xml' 'http://206.167.181.124:7200/repositories/cldi-test-9/statements?subj=%3Chttp%3A%2F%2Fcanlink.library.ualberta.ca%2Fsubject%2Fbcb3603403b3c89cefd49d1e3c75e4c6%3E'

Does this work?

jchartrand commented 3 years ago

@danydvd I get an access denied error for that CURL:

curl -X GET --header 'Accept: application/rdf+xml' 'http://206.167.181.124:7200/repositories/cldi-test-9/statements?subj=%3Chttp%3A%2F%2Fcanlink.library.ualberta.ca%2Fsubject%2Fbcb3603403b3c89cefd49d1e3c75e4c6%3E'
Error - http status (403) - Access is denied

Otherwise, though, that interface is likely fine.

danydvd commented 3 years ago

@jchartrand I have made the repository open. It should work. I will have to update the curl headers with credentials for future use. But for now this should work for development purposes.

jchartrand commented 3 years ago

@danydvd Indeed - works well:

curl -X GET --header 'Accept: application/rdf+xml' 'http://206.167.181.124:7200/repositories/cldi-test-9/statements?subj=%3Chttp%3A%2F%2Fcanlink.library.ualberta.ca%2Fsubject%2Fbcb3603403b3c89cefd49d1e3c75e4c6%3E'

returns:

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF
    xmlns:cldi="http://canlink.library.ualberta.ca/ontologies/canlink#"
    xmlns:schema="http://schema.org/"
    xmlns:wgs="http://www.w3.org/2003/01/geo/wgs84_pos#"
    xmlns:void="http://rdfs.org/ns/void#"
    xmlns:owl="http://www.w3.org/2002/07/owl#"
    xmlns:gn="http://www.geonames.org/ontology#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
    xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
    xmlns:cwrc="http://sparql.cwrc.ca/ontologies/genre#"
    xmlns:ns2="http://dbpedia.org/ontology/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:doap="http://usefulinc.com/ns/doap#"
    xmlns:rel="http://id.loc.gov/vocabulary/relators/"
    xmlns:bibo="http://purl.org/ontology/bibo/"
    xmlns:vivo="http://vivoweb.org/ontology/core#"
    xmlns:prov="http://www.w3.org/ns/prov#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    xmlns:dc="http://purl.org/dc/terms/">

<rdf:Description rdf:about="http://canlink.library.ualberta.ca/subject/bcb3603403b3c89cefd49d1e3c75e4c6">
    <rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>
    <owl:sameAs rdf:resource="http://id.loc.gov/authorities/subjects/sh85106287"/>
    <rdfs:label>veaux clonés</rdfs:label>
    <void:inDataset rdf:resource="http://canlink.library.ualberta.ca/void/canlinkmaindataset"/>
    <prov:wasGeneratedBy rdf:resource="http://canlink.library.ualberta.ca/runtime/7d47e1f0dc12cf0777779ff905fcb4e8"/>
</rdf:Description>

</rdf:RDF>

Thank you.

jchartrand commented 3 years ago

@danydvd @sfarnel

The download works fine, but while working on this it occurred to me that it might be nice to also provide the option to view the RDF in a popup so that people aren't required to fully download the file (and then have to open it, and clean up old downloaded files) if all they want is just a quick look. We would still allow the download.

Desirable?

If so, we'd have to enable CORS on the calls to the triple store:

curl -X GET --header 'Accept: application/rdf+xml' 'http://206.167.181.124:7200/repositories/cldi-test-9/statements?subj=%3Chttp%3A%2F%2Fcanlink.library.ualberta.ca%2Fsubject%2Fbcb3603403b3c89cefd49d1e3c75e4c6%3E'

@danydvd - possible to enable CORS?

jchartrand commented 3 years ago

@danydvd

Actually, I realize that we need to enable CORS for #56 as well. Possible to enable CORS?

sfarnel commented 3 years ago

Thanks @jchartrand I agree this would be nice. Hoping that @danydvd can enable CORS without too much trouble

danydvd commented 3 years ago

@jchartrand I added the CORS parameters to the in.sh and restarted GraphDB. Can you please check it?

jchartrand commented 3 years ago

Yes! It is working. Thanks @danydvd

@sfarnel

jchartrand commented 3 years ago

I've uploaded a new version of the site with the RDF preview and download. This download is for individual records, and so you can see it from a record page like:

http://206.167.181.124/record/286e6db36084a5c497a30d7cbf6247d1

You'll see that I've given it no description, and even the title is minimal ('RDF'). Let me know if you like a different title or any additional description (e.g., 'this is the RDF for this thesis...')

For a larger download of the whole record set, two questions:

where should I put the link to the full record set download
@danydvd what SPARQL query would return the full record set (or at least the part we want, and for that matter, what is the part we want? @sfarnel )

sfarnel commented 3 years ago

Thanks @jchartrand this is super exciting!

Let's add a button on the home page (either to the right of or just underneath the About and Contact, and styled the same) called 'Download Full Dataset'.

@danydvd I'm thinking in terms of dataset that we want everything we would want what we would provide for a given thesis, but for the entire set. Does that seem reasonable?

danydvd commented 3 years ago

@jchartrand @sfarnel this will download the entire dataset (for some reason it does not work from the workbecnch!)

curl -X GET --header 'Accept: application/rdf+xml' 'http://206.167.181.124:7200/repositories/cldi-test-9/statements

jchartrand commented 3 years ago

Thanks @danydvd

That's a big download - just shy of 1Gig. Should we maybe include a popup (triggered by clicking the 'Download Full Dataset' button) that warns the user that they are about to download a very big file, and that it might both take a while to download and use up a fair bit of space? @sfarnel

sfarnel commented 3 years ago

Thanks @jchartrand I think adding a popup would be good. Would it be easier to offer a download in alternate serializations?

jchartrand commented 3 years ago

@sfarnel Alternate serializations is a very good idea. I know we can provide turtle at the very least. So, maybe in the popup allow the user to pick from possible serializations?

It would be cool to also provide a zipped version too, but that is question for @danydvd whether we could somehow do that on the server.

On Dec 21, 2020, at 2:19 PM, Sharon Farnel notifications@github.com wrote:

Thanks @jchartrand https://github.com/jchartrand I think adding a popup would be good. Would it be easier to offer a download in alternate serializations?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ualbertalib/can-link/issues/57#issuecomment-749151532, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEFSXIB56NP7RMJY5QT3RTSV6NUVANCNFSM4TPZF23Q.

sfarnel commented 3 years ago

Thanks @jchartrand I think a zipped version would be great, especially since it would help with the size issue. @danydvd is it easy enough o provide options re: serialization and also a zip version?

danydvd commented 3 years ago

@jchartrand in terms of serialization we can also provide json, n-triples.

For providing a zipped version, we can do a daily (or any another frequency) dump and have a script create a zipped file which can be downloaded (I have not seen a way to do this directly from the triplestore)

sfarnel commented 3 years ago

Thanks @danydvd this is great! Why don't we offer each serialization (lots of choice is nice!) and perhaps see if we can craft a script to generate that zipped version. We could do quarterly to start

jchartrand commented 3 years ago

@danydvd @sfarnel

I've uploaded a new version that now shows RDF in three places:

a new full download button up by the About and Contact buttons, that spawns a dialog from which to download
the RDF preview/download dialog on the thesis record page
a new 'Subject' page (that is linked from subjects on the thesis record page)

In all three places you can choose which serialization you'd like to either view or download.

(NOTE: I am still fixing a problem with the download from the RDF dialog - it isn't downloading the selected serialization)

Let me know if any labels, font size, etc. should be changed anywhere.

More about the new Subject page in #56

sfarnel commented 3 years ago

Thanks @jchartrand this is really fantastic!

jchartrand commented 3 years ago

@sfarnel @danydvd The problem with the RDF dialog is now fixed - so now all three places (full download, subject download, thesis download) from which you can download RDF all behave the same and will all let you choose the serialization.

For the above suggested script to periodically generate a zipped version, that is I think something that Danoosh would have to do on the server.

sfarnel commented 3 years ago

Thanks @jchartrand @danydvd and I can work together on getting the script in place. Closing issue

ualbertalib / can-link

RDF download #57