Closed javfg closed 1 month ago
While doing this, please could we make output unified with the rest of the ETL i.e. parquet output in the ETL output path? Currently POS loads the so.json directly from the input folder, which couples POS to PIS, when ideally it should be handling ETL outputs only.
The Json file needs to be mapped the structure is different to the owl file. I'll be using the extractor to extract 3 different inputs and create a new step in the ETL to process the inputs and produce the SO in the correct format.
The extractor changes are done and the mappings are being worked on
After discussing with @d0choa the mapping has been simplified and the output now only contains id and label which are the only fields being exposed by the API
Description
The
so
step downloads the sequence ontology.Transformations PIS was doing
PIS was downloading the OWL version and using Apache Jena to convert it to JSONL. There is a JSON version, so we can use that one to avoid Jena and transform it into JSONL directly.
The query PIS used to convert Jena's output into JSONL is:
But we must adapt it to the
so.json
version, as it is not the same as Jena's output.Tasks