hubmapconsortium / ingest-api

MIT License
0 stars 0 forks source link

Datasets Publish Move Json Dump to End Fix #508

Closed ChuckKollar closed 6 months ago

ChuckKollar commented 7 months ago

It is breaking the /datasets//publish endpoint because the data directory is moved to a different location (as part of the publication process) and it is trying to write the file into the old directory.

It appears that the problem is related to publishing dataset_data_access_level == 'consortium', but this needs to be explored more completely.

I have a test program (Python) that tests locally for both 'consortium' and 'protected' Datasets.

NOTE: Bill needs to make a card containing the items that he published so that a metadata.json file can be made for them.

ChuckKollar commented 7 months ago

PR: https://github.com/hubmapconsortium/ingest-api/pull/511

It was difficult testing this locally; setfacl does not exist on OS X and I needed to comment this out locally. I tested both consortium and protected.

ChuckKollar commented 6 months ago

Also see PR: https://github.com/hubmapconsortium/ingest-api/pull/521

ChuckKollar commented 6 months ago

Also see PR: https://github.com/hubmapconsortium/ingest-api/pull/522

ChuckKollar commented 6 months ago

There should be three tests (but there are only the first two implemented) where: 1) Primary human genetic sequences = False (consortia) generate a metadata.json, 2) Primary human genetic sequences = True (protected) generate a metadata.json, 3) Primary a derived/processed dataset (datasets that hang off other datasets) DO NOT generate a metadata.json.

I have tests for 1) and 2) but not 3).

If the wording is specifically "datasets that hang off of datasets", I assume he means MATCH (d1:Dataset)<-[:ACTIVITY_OUTPUT]-(:Activity)<-[:ACTIVITY_INPUT]-(d2:Dataset) RETURN (d2), but this should be confirmed with Bill. There is also some issue of explaining how to get them based on older criteria.

ChuckKollar commented 6 months ago

I now have a test for 3) based on this Cypher which should return a dataset that has a parent. I am using this to test that there SHOULD NOT be a metadata.json file written for these cases.

MATCH (Dataset)-[:ACTIVITY_INPUT]->(Activity)-[:ACTIVITY_OUTPUT]->(ds: Dataset {status:'QA'}) RETURN ds

ChuckKollar commented 6 months ago

PR: https://github.com/hubmapconsortium/ingest-api/pull/524 This PR does not publish the metadata.json file if the Dataset has a parent.