wilkinsonlab / SADI-Specification

Specification of the design patterns for SADI (Semantic Automated Discovery and Integration) Semantic Web services
2 stars 2 forks source link

How to process big data inputs #5

Open mikel-egana-aranguren opened 8 years ago

mikel-egana-aranguren commented 8 years ago

Perhaps this is completely out of scope but I think Mark mentioned it to me once?

The SADI spec assumes that the input data is in RDF. This means that you have to be able to convert your data to RDF, but sometimes that's not possible/desirable. For example, if I have a huge BIOM file, and there is a SADI service that given a BIOM file, can give me the abundance of a taxon, how would that work? I can't POST the file. However, I can upload the file to somewhere, and perhaps the SADI service can download the tar.gz-ed file, process it (asynchronously), and give the result to my client later? The input instance would not have any data though, it would have to point to the tar.gz file.

Is this conceivable/desirable?

markwilkinson commented 8 years ago

It is definitely doable, but we have not really specified how (and we probably should). IMO the way to manage this is to make the URI of the input node the URI of your data-file. If it is rdf:typed as a particular class, where everyone agrees that things of that class are gzipped BIOM files, then the service can call the URI and deal with the data.

That's how I think it should work (or something like that)... but it would be good to specify that behavior.

M

On 01/26/2016 01:14 PM, Mikel Egaña Aranguren wrote:

Perhaps this is completely out of scope but I think Mark mentioned it to me once?

The SADI spec assumes that the input data is in RDF. This means that you have to be able to convert your data to RDF, but sometimes that's not possible/desirable. For example, if I have a huge BIOM file, and there is a SADI service that given a BIOM file, can give me the abundance of a taxon, how would that work? I can't POST the file. However, I can upload the file to somewhere, and perhaps the SADI service can download the tar.gz-ed file, process it (asynchronously), and give the result to my client later? The input instance would not have any data though, it would have to point to the tar.gz file.

Is this conceivable/desirable?

— Reply to this email directly or view it on GitHub https://github.com/wilkinsonlab/SADI-Specification/issues/5.

Mark Wilkinson Madrid, Spain