ncihtan / data-models

Schema.org Data Models for HTAN
MIT License
14 stars 7 forks source link

Schematic refactor testing #322

Closed adamjtaylor closed 6 months ago

adamjtaylor commented 9 months ago

Confluence docs on testing

Timing for the Refactor testing and official launch:

  • Testing via CLI and API: Nov 16th - Dec 7th (Note for those that haven't used the API before please check it out and let me know if you have any questions)
  • Deploy to DCA Staging for final testing: Targeting Dec 7th (Will give a final call to do any additional testing on DCA)
  • Release to AWS Prod: Targeting Dec 14th

As always, reach out with any questions or concerns - feel free to use the #fair-data-tools slack channel or contact me directly.

adamjtaylor commented 8 months ago

Per whiteboarding -> Testing via CLI and API: Nov 16th - Dec 7th (Note for those that haven't used the API before please check it out and let me know if you have any questions needs to be done by 7th.

aclayton555 commented 8 months ago

Success criteria: Is everything working as it was before? Are manifests still being created as expected (e.g. columns, column order, etc).

Considerations:

aclayton555 commented 8 months ago

Process: Develop script to test across all components. Heavier lift, but can be reused and add value for future, continued use.

Dedicate half day for this. Complete by Dec 15.

adamjtaylor commented 8 months ago

I've completed initial testing based on the following makefile.

These were all able to run so I don't see any immediate breaking errors.

init:
    schematic init --config config_example.yml

convert:
    schematic schema convert "data-models/HTAN.model.csv" > convert.log 2>&1

# JSON file containing the data
JSON_FILE := data-models/dca-template-config.json

# Extracting the schema_name values into a list
COMPONENTS := $(shell jq -r '.manifest_schemas[].schema_name' $(JSON_FILE))

# Target to run the command for each schema_name
get_templates:
    $(foreach comp, $(COMPONENTS), \
        schematic manifest -c config_example.yml get -dt $(comp) -s >$(comp)_stdout.log 2>$(comp)_stderr.log; \
    )

fetch_manifests:
    @echo "Fetching manifests..."
    @synapse query "SELECT * FROM syn20446927 WHERE name LIKE 'synapse_storage_manifest_%.csv'" > all_manifests.tsv

    @while IFS=$$'\t' read -r row_id row_version row_etag id name type currentVersion parentId benefactorId projectId createdBy createdOn modifiedOn modifiedBy dataFileHandleId etag allowedTeam fileFormat Component dataFileSizeBytes dataFileMD5Hex dataFileConcreteType dataFileBucket dataFileKey description dataFileName; do \
        if [ "$$row_id" != "ROW_ID" ]; then \
            echo "Processing ID: $$id"; \
            filepath=$$(synapse get "$$id" | grep 'Downloaded file:' | awk '{print $$NF}'); \
            echo "Downloaded File Path: $$filepath"; \
            if [ -f "$$filepath" ]; then \
                component=$$(awk -F, 'FNR == 2 {print $$1}' "$$filepath"); \
                echo "Validating against: $$component"; \
                schematic model --config config_example.yml validate -dt "$$component" -mp "$$filepath"; \
            else \
                echo "File $$filepath not found"; \
            fi; \
        fi; \
    done < all_manifests.tsv
adamjtaylor commented 8 months ago

Follow on tasks

adamjtaylor commented 8 months ago

@mialy-defelice FYI I think I am happy with the schematic refactor testing. It might be worth finding time for us to check in to discuss.

mialy-defelice commented 8 months ago

@adamjtaylor Thanks so much for testing it out! Feel free to schedule some time on my calendar, it is up to date with my availability for at least the next month.

aclayton555 commented 7 months ago

2024.01.04 mid sprint: Keep open in Jan sprint. Adam will be meeting with Mialy in January to talk through this further.

Longer term, consider how all manifests can be validated against the data model with each update/release.

aclayton555 commented 7 months ago

As we close out this sprint, outline additional actions (and issues) associated with required action communicated by Amy on 2024.01.22 (see Slack: https://sagebionetworks.slack.com/archives/C01ANC02U59/p1705959957507399)

All Schematic and DCA Users - This release includes the latest from the Schema Refactor (details in confluence here) and release notes here. Please pay special attention to the breaking change and required actions in the confluence page. Deployment of this Schematic version to DCA will be shared via the FAIR data release calendar invite.

aclayton555 commented 6 months ago