unioslo / tsd-file-api

REST API for upload and download of files and JSON data
BSD 3-Clause "New" or "Revised" License
0 stars 8 forks source link

PATCH accepts all valid json bodies #186

Open drmowinckels opened 10 months ago

drmowinckels commented 10 months ago

the patch submissions endpoint accepts all json bodies that are in the format

{
    "answers": {
        "col1": true,
        "col2": 1,
        "col3": "hello"
    }
}

The problem arises when the provided json keys do not exist in the form they are being submitted to, this results with data for this one submission that no longer adheres to the rest of the forms data. I.e. this submission is incompatible with the rest of the data.

In the nettskjema loader (web ui) inside TSD, this means the data for this submission looks empty. It means also accessing the data from the API, this one submission has data that cannot be appended to the rest, as its incompatible.

example, given the true data are as above, if someone submits

{
    "answers": {
        "col4": true,
        "col5": 1,
        "col7": "hello"
    }
}

status return is 201, data is successfully updated. And this is the data I can receive from the API. But I cannot add it to the remaining form data, as these are column keys that dont exist in the rest.

This might be particularly important to think about because of the nested structure for checkboxes and matrices.

if the true data are

{
    "answers": {
         "mc359": {
            "col1": "value",
            "col2": 1,
            "col3": true,
            "col4": "value"
        }
    }
}

and you submit

{
    "answers": {
            "mc359.col1": "value",
            "mc359.col2": 1,
            "mc359.col3": true,
            "mc359.col4": "value"
    }
}

status is 201, data is sent. But no longer compatible with the remaining data. And I can see this happening for the nested data easily.

It would be ideal if there was some sort of API validation towards the forms metadata that stops incorrectly formatted patch data to be accepted.

leondutoit commented 10 months ago

Right now, the API accepts anything, by design. Data validation should be performed by the clients. The only way I can see supporting server-side schema validation is if it can be implemented such that it is:

And implementing that would take quite a lot of effort - so much that it is unlikely there will be any change of this in 2023. In the meantime I suggest you implement schema validation in your API client.

drmowinckels commented 10 months ago

I think, given that you have made an endpoint to patch data, you will likely be setting yourselves up for getting lots of users having done this incorrectly and messing up their data. Once the data for a submission no longer follows the forms metadata, it also breaks the Nettskjema webui loader (not in all cases, but in certain formats).

leondutoit commented 10 months ago

Maybe so, that does not change the timeline for any API-side implementation though, so I still suggest adding schema validation to your tool.

leondutoit commented 2 months ago

Adding validation to pysquril: https://github.com/unioslo/pysquril/issues/60 will enable this.

f-krull commented 2 months ago

Note: To force nettskjema submissions to follow the forms metadata, it would require a custom schema for each form (plus updates whenever the metadata gets updated)

leondutoit commented 2 months ago

So we have two generic JSON schemas atm: one for submissions, and another for form metadata. I was thinking that all submissions and all instances of form metadata would be compliant with the aforementioned schemas. Is that not correct?

Having submissions follow form metadata is something else than JSON schema validation, correct?

f-krull commented 2 months ago

So we have two generic JSON schemas atm: one for submissions, and another for form metadata. I was thinking that all submissions and all instances of form metadata would be compliant with the aforementioned schemas.

Yes, correct. Also the submission schema is composed of mini schemas - one for each type of answer (attachment, checkbox, ...).

Currently the nettskjema portal validates against the submission schema whenever edits to a submission are made. Here we know which answer is being edited. Therefore it's using the corresponding mini schema.

Having submissions follow form metadata is something else than JSON schema validation, correct?

Hmm, not necessarily. It would be possible to generate a more specific submission schema from the metadata. The current generic schema allows all possible types for any key. But with info from the metadata, we could generate a schema that would, for example, say "key answer1 needs to fulfill the attachment schema". This would address the issue Mo mentioned in her last post

f-krull commented 2 months ago

Ok TL;Dr, my point was just that schema validation might not solve the issue Mo mentioned unless we do additional stuff

leondutoit commented 2 months ago

I see, thanks for the info.

In terms of the scope of https://github.com/unioslo/pysquril/issues/60 it sounds like pysquril should implement running an arbitrary function on input data before insert and updates - something that raises a generic exception like a ValidationError.

It would then be up to the library user to implement the required logic. In this case, the file/survey API could then do the domain specific work of preparing the relevant schema/mini schema, and pass that along with a validator function to pysquril.

Or maybe https://github.com/unioslo/pysquril/issues/6 is not needed at all? Don't have the full picture yet...

leondutoit commented 2 months ago

@uio-torjus and I have drawn up a plan: