Closed akariv closed 7 years ago
@akariv do you mean that in the curl command we shall use the url of a datapackage.jsonld instead of the datapackage.jsonld itself? and the value of the 'path' shall be the relative to the url of the datapackage.jsonld? if yes, datapackage.jsonld and csv files shall be located at the same place.
@HimmelStein yes, that is correct.
CSV files and datapackage.jsonld are usually located at the same location as they are part of the same data package.
The current state was agreed on before. Everything can be changed, but I think that LinkedPipes support only file input in POST and not GET parameters. I.e., you could POST the descriptor.json file URL in a simple plain-text or RDF file - that could be done immediately. I will have to check the possibilities of including parameters directly into the GET/POST request with Jakub.
I think you can POST the datapackage.json's URL, convert it the an RDF triple using t-filesToStatements, and use SPARQL CONSTRUCT to build RDF configuration out of the triple for e-httpGetFiles. It's a bit convoluted, but should be workable now.
This can be made simpler if OpenSpending hook can POST data in RDF instead of a mere URL. For example, in JSON-LD:
{
"@context": {
"@vocab": "http://schema.org/"
},
"url": "http://some.where/datapackage.json"
}
JSON-LD can be directly ingested as RDF, so resorting to the trickery above is not needed.
What @jindrichmynarz wrote is what I had in mind. I can implement it like that. @HimmelStein and @akariv: do you agree?
Yes, if it makes things easier then it's absolutely no problem to send RDF in the POST body. What should be the Content Type for such a request?
@marek-dudas @akariv @jindrichmynarz I try to understand Jindrich's idea more clearly. The json-ld file has only two keys "@context" and "url", all other information is pointed by the value of the "url". CSV files in the datapackage.json use relative path (relative to "http://some.where/"). anything corrections?
I think that is correct.
The JSON-LD example in my comment represents only 1 triple:
_:b0 <http://schema.org/url> "http://some.where/datapackage.json" .
You can see how JSON-LD expands to RDF in the JSON-LD Playground (see the N-Quads tab).
As a side note, should we want to have the datapackage.json
URL not as a literal and instead treat it as an RDF resource, we can use the following JSON-LD:
{
"@context": {
"@vocab": "http://schema.org/",
"url": {"@type": "@id"}
},
"url": "http://some.where/datapackage.json"
}
Regarding the resolution of the relative URLs in the datapackage.json
, @marek-dudas can either parse the URL of the datapackage.json
to obtain the base URL or the base URL can be explicitly provided in the JSON-LD input to the FDP2RDF pipeline using the @base
attribute:
{
"@context": {
"@base": "http://some.where/",
"@vocab": "http://schema.org/",
"url": {"@type": "@id"}
},
"url": "datapackage.json"
}
@akariv: Regarding the content type header for JSON-LD payload, the standard is application/ld+json
(see the spec). In the case of the FDP2RDF pipeline, the content type of the POST body will be ignored and instead manually hard-coded in the pipeline, so it is not strictly necessary to provide it.
Thinking over this again I realized that the @base
attribute is of no help in establishing a base URI, because it is transparent when processed as RDF (@base
is only a syntactical artefact of JSON-LD). @marek-dudas would probably need to implement something like urllib.parse.urljoin
in SPARQL.
The pipeline should now support datapackage descriptor URL sent in a jsonld file as discussed above. See readme for more details.
@marek-dudas: Can you explain why the name datapackage.jsonld
is required?
The pipeline has a simple file filter in the beginning, switching between the "datapackage descriptor posted directly" and "just the URL of the descriptor posted" inputs based on the filename. Less user friendly but also less error-prone in my opinion. I think it would take some time to enable arbitrary filename, since the pipeline would have to first look into the file and determine if it is a datapackage descriptor or just its url according to the content. And since LinkedPipes AFAIK does not support if/then/else nodes, it might get quite complicated.
I see. I thought the support of posting datapackage.json
directly was dropped in favour of posting the download instructions.
Just a reminder: the proposal has been implemented and documented some time ago, so feel free to test it on Fraunhofer server and close the issue eventually.
The DataPackage specification (on which the Fiscal Data Package is based) specifies that data packages are to be referred with URLs. The reason for that is that a datapackage consists of the resources as well as the descriptor file (
datapackage.json
). When you point to the descriptor or the root of the package you also provide information regarding the rest of the contents of the package.This means that most references from the
datapackage.json
to its resources are usually done with relative URLs, and not with the fully qualified ones.The current implementation that requires the json file to be uploaded as POST data, doesn't convey the origin of the file, which goes against the principles of datapackage as well as making it impossible to locate the resources in case the paths are relative.