openfoodfacts / openfoodfacts-server

Open Food Facts database, API server and web interface - 🐪🦋 Perl, CSS and JS coders welcome 😊 For helping in Python, see Robotoff or taxonomy-editor
GNU Affero General Public License v3.0
634 stars 371 forks source link

(Proposal) Improving the Product JSON API #1056

Open aleene opened 6 years ago

aleene commented 6 years ago

Swift 4 now supports the decoding and encoding of json-files out of the box. I started rewriting my OFF decoding code (which used SwiftyJSON) in order to be less dependent on third party code. I all goes well, it will simplify my code.

While implementing this new code, I noticed several issues with the OFF product json. Other users of the api might encounter the same issue.

Desired approach

Ideally I would be able to use the standard Swift libraries without adding extra code. It should suffice to define the structure of the json in terms of Swift structs. I should be able to copy the keys as variables. And some values can be defined as Swift enums.

Unsupported characters

Swift does not support a colon “:” or a dash “-“ in variable names. Neither are variables with just numbers supported. So any keys that contain these must be redefined with CodingKeys. Example: saturated-fat_unit This is also an issue with values that reflect a taxonomy, like nutrient_level. Removing the unsupporter characters would allow for easier translation to Swift enums.

Non-existing values

Multiple keys can have empty values, i.e. NULL, an empty array, etc.. What is the meaning of this? What should the code do with this? And if a field is not present?

I interpret these now as a Swift nil. In the decoding code this implies that I have to define these a a Swift optional.

Product language dependent keys

Some keys are dependent on the main language of the product, for instance generic_name_fr. One has to add the postfix _fr to the key generic_name (for instance), if one wants to decode these json fields. This can only be done if the main language tof the product json has already been read. It is not a good idea if one needs to builds keys based on information in the json itself. It requires parsing the json twice.

A better solution would be to encode language specific fields like:

”product_names”: {
    “language_code”, “name”
}

Nutriments inconsistent

This is a very difficult field. It lacks some basic structure. This implies that all possible nutriments for 4 different values must be encoded. This breaks down the experiment.

Structuring seems not possible as fields have sometimes String or Float values. Example: "fiber":"0", "salt":1.59766. This differs also between products. This breaks down the experiment.

A solution would be to encode it like:

"nutriments”:{
    “nutriment”: {
       “name”: Nutriment_Taxonomy
        “base”: Float,
        “per100g: String,
        “serving: String,
        “unit”: Unit_Taxonomy
    }
}

Conclusion

There are to many complexities in the OFF-json to allow simple decoding. A lot of extra code is required to capture these inconsistencies. It is advisable to remove these complexities to allow easier adoption of the api’s.

hangy commented 6 years ago

I agree. The problem is that right now, the product API is basically the MongoDB document with the image URLs added (IIRC). The MongoDB collection is actually populated directly from the Perl structure. All of that having grown organically over a few years leads to the problems that you described.

As far as I can tell, there is no good/correct way to fix the v1 API. It would be better to design how the JSON document should look for v2 from an API POV (obviously taking into account what data is available), and implement the Perl backend to return the product in the designed format (instead of forcing the internal Perl structure on consumers).

aleene commented 6 years ago

Just see this as input for the next api version. What should be in there and how should it be structured?

Dwarfex commented 5 years ago

Currently the API JSON response is always different as the Values are in a random order.

Proposal: Order the result set alphabetically before generating a json response from the set.

This helps readability and should help developers get an quick overview of the result. This also helps getting some information about available fields, when some kind of documentation is missing.

VaiTon commented 5 years ago

Currently the API JSON response is always different as the Values are in a random order.

Proposal: Order the result set alphabetically before generating a json response from the set.

This helps readability and should help developers get an quick overview of the result. This also helps getting some information about available fields, when some kind of documentation is missing.

I created a new issue to keep track of this idea.