neontribe / Linked_Development

Linked Development
1 stars 1 forks source link

R4D Data import #14

Closed harryharrold closed 11 years ago

harryharrold commented 11 years ago

An RDF dump of the Research for Development data will be updated weekly on at http://www.dfid.gov.uk/r4d/rdf/R4DOutputsData.zip in a number of individual files (approximately 10k records per file). A scheduled process is needed which will:

Download these files; Load them into triple store in a named graph; Log and issues and errors; Report any errors to a defined set of e-mail address;

This file will contain the RDF data files for the R4D data.  The file name format is:

R4DOutputsData-[SequenceNumber]-[yyyyMMddhhmmss].rdf

For example, R4DOutputsData-1-20130226052425.rdf.  

Each file will contain a triple indicating the date and time that the data dump as a whole was run and the modifications serialised.  This will appear in the form:

pointy bracket rdf:Description rdf:about="http://www.dfid.gov.uk/R4D/RDF/R4DOutputsData.zip" dcterms:modified="2013-02-26T05:24:24:004" close pointy bracket

neil-dabson commented 11 years ago

done

harryharrold commented 11 years ago

Works if end-point works and gives a .zip file with good data.

Fails if a successfully downloaded .zip file fails to uncompress - we retain the data in our triple store. Fails if we do not find a file at http://www.dfid.gov.uk/r4d/rdf/R4DOutputsData.zip - we retain the data in our triple store. Fails spectacularly if a successfully downloaded .zip file uncompresses but does not contain the type of content we expect - at this point, we've deleted the data in our triple store.

@practicalparticipation - could you confirm this is OK, and you've testing this successfully? (And then close this..)

harryharrold commented 11 years ago

Q: Does this mean it'll go wrong if we upload an empty .zip file or one which contains different formats?

"yes." "fair enough"

Advised that @practicalparticipation will do some backup work....