How to retrive file outputs

alpha-beta-soup commented 4 years ago

https://github.com/geopython/pygeoapi/issues/111#issuecomment-665391968

My reading of the draft spec is that a client making a GET request to /processes/{processId}/jobs/{jobId}/results, for a job that has finished executing without meeting an error condition, will get a 200 HTTP status and JSON output like:

[
   {
        "id": "output_dataset",
        "value": "/tmp/local-outlier-factor/d65b882e-d140-11ea-9898-0242ac150018/some-dataset.csv"
    }
]

For multiple outputs, I believe it would be:

[
   {
        "id": "output_dataset",
        "value": "/tmp/local-outlier-factor/d65b882e-d140-11ea-9898-0242ac150018/some-dataset.csv"
    }, {
        "id": "output_dataset",
        "value": "/tmp/local-outlier-factor/d65b882e-d140-11ea-9898-0242ac150018/some-other-dataset.csv"
    }
]

(This is how I'm currently organising a demonstration implementation, for a process that writes output to disk. All outputs are being kept in /tmp/{processId}/{jobId}/. But I actually just want to return the file for download to the client's machine, not the path on the server.)

But the spec does not specify how a client can actually get file/complex outputs. The process definition allows one to specify the media type of various inputs and outputs, e.g. text/csv. So should it respect the client's Accept header?

But what if the output was a JSON file? Then the Accept header would be ambiguous between the result-as-json-metadata and the file-output-result-as-json-data.

What if there are multiple files? The output would probably need to be an archive of them all; and they could be of mixed type; again these leads to issues when relying on Accept headers.

Should they be (HTTP) links? According to implementation, output datasets could be uploaded to a third-party (e.g. S3) or other place for download (including a newly-created /collection), but the /results response object would only have URI/s recorded against particular output IDs, and would expect the client to know what to do beyond that (and possibly even know that they are HTTP URIs).

bpross-52n commented 4 years ago

This sounds like you want to request outputs by reference.

In this case, a href element will be used, like in the following example:

{
    "outputs": [
        {
            "id": "result",
            "value": {
                "href": "http://xy.z/result"
            }
        }
    ]
}

Relevant schemas: https://github.com/opengeospatial/wps-rest-binding/blob/master/core/openapi/schemas/result.yaml https://github.com/opengeospatial/wps-rest-binding/blob/master/core/openapi/schemas/outputInfo.yaml https://github.com/opengeospatial/wps-rest-binding/blob/master/core/openapi/schemas/valueType.yaml https://github.com/opengeospatial/wps-rest-binding/blob/master/core/openapi/schemas/referenceValue.yaml

There is an open issue #43 (that should have been fixed already), where it is requested to (re-)add the format to the outputInfo, so clients know what they can expect. The handling of headers and such is a task of the server where the results are actually stored.

You could of course also use something like "href": "http://xy.z/result.csv"

I hope this clarifies it a bit.

alpha-beta-soup commented 4 years ago

I wonder if there are any guidelines for multiple outputs, or perhaps zip (or other) archives consisting of multiple files?

e.g. a process accepting 1...n input files that produces a corresponding number of output files, the output could be this:

{
    "outputs": [
        {
            "id": "result",
            "value": {
                "href": "http://xy.z/result-a.csv"
            },
            "format": {
                "mimeType": "text/csv"
            }
        },
                {
            "id": "result",
            "value": {
                "href": "http://xy.z/result-b.csv"
            },
            "format": {
                "mimeType": "text/csv"
            }
        }
    ]
}

or could instead be this:

{
            "id": "result",
            "value": {
                "href": "http://xy.z/result.zip"
            },
            "format": {
                "mimeType": "application/zip"
            }
        }

but it is not clear what format the contents of the archive are...

bpross-52n commented 4 years ago

Partly also discussed in #37. This is something we should continue to discuss for the next draft version (after the public review).

bpross-52n commented 4 years ago

I will close this issue and add a back link in #37

opengeospatial / ogcapi-processes

How to retrive file outputs #90