Describe the bug
When converting the molecule parquet to json, the crossReferences struct appears correct, but is different to the json historically produced by spark. The API has a transformer to handle the spark json and the crashes when parsing the new json.
So the new json complies with the schema but causes the API to crash. A small fix to the API could resolve this and maintain the schema between the parquet and json. This will break backwards compatibility with the old data.
Describe the bug When converting the molecule parquet to json, the
crossReferences
struct appears correct, but is different to the json historically produced by spark. The API has a transformer to handle the spark json and the crashes when parsing the new json.So the new json complies with the schema but causes the API to crash. A small fix to the API could resolve this and maintain the schema between the parquet and json. This will break backwards compatibility with the old data.
Observed behaviour original:
"crossReferences":{"PubChem":["144207106","144207960"],"Wikipedia":["Fructose"],"chEBI":["28645"]}
new:"crossReferences":[{"key":"PubChem","value":["144207106","144207960"]},{"key":"Wikipedia","value":["Fructose"]},{"key":"chEBI","value":["28645"]}]
Expected behaviour The API can read the json that has the same schema as the parquet.