ncihtan / data-models

Schema.org Data Models for HTAN
MIT License
14 stars 6 forks source link

Evaluate pointing to .csv data model in DCA config #350

Closed AmyHeiser closed 3 months ago

AmyHeiser commented 4 months ago

Point to the raw csv in the DCA config rather than the JSONLD to improve manifest generation and submission speeds in DCA. ie change this line to the csv https://github.com/Sage-Bionetworks/data_curator_config/blob/04f82525243c600e3bfadbb4203eaa16ca6face6/HTAN/dca_config.json#L5

[dca_config.json]``` (https://github.com/Sage-Bionetworks/data_curator_config/blob/04f82525243c600e3bfadbb4203eaa16ca6face6/HTAN/dca_config.json) "data_model_url": "https://raw.githubusercontent.com/ncihtan/data-models/main/HTAN.model.jsonld",



Test here: https://dca-staging.app.sagebionetworks.org/
adamjtaylor commented 4 months ago

@aclayton555 one to prioritize for our March sprint

aclayton555 commented 4 months ago

Update is straightforward, but include some testing around manifest generation, particularly any manifests that were previously failing. Could also loop in Thomas K from Stanford as an external tester.

adamjtaylor commented 4 months ago

PR open: https://github.com/Sage-Bionetworks/data_curator_config/pull/156

adamjtaylor commented 3 months ago

Testing in HTAN Center C and imaging_level_2 folder I was able to

adamjtaylor commented 3 months ago

So this seems to be working. Template generation did seem to be faster than prod DCA currently using the JSON LD but I did not time it

adamjtaylor commented 3 months ago

I think we should confirm with FAIR that this is a OK approach before rolling out to prod. We have not seen users reporting timeout errors the past few weeks, so maybe the initial improvements have been enough and we should keep this in the back pocket for the the renewal.

aclayton555 commented 3 months ago

Currently in staging - seems to be working and faster than prod, but probably needs more testing. If we want to roll this out, would need to update prod config to point to the csv.

Agree to keep this in mind for the renewal, but do not see a need to implement this now.

AmyHeiser commented 3 months ago

Thanks for testing Adam and the comments Ashley - we will have other longer term improvements to speed up manifest generation and submission soon so agreed to keep this as a backup option when needed.