hdmf-dev / hdmf-common-schema

Specifications for pre-defined data structures provided by HDMF.
Other
3 stars 8 forks source link

Workflow for HDMF Dev compatibility #77

Closed mavaylon1 closed 1 year ago

mavaylon1 commented 1 year ago

Summary of changes

Created new workflow to check for HDMf Dev compatibility.

The workflow will clone the dev branch of HDMF and run the tests for "common".

PR checklist for schema changes

mavaylon1 commented 1 year ago

@rly @oruebel I took a different route to what was on nwb-schema and similar to how we test pynwb on hdmf. The way nwb-schema validates is comparing the yaml to a json file, which we do not have. I thought it would be good enough to install hdmf and just test the "common" folder.

I do have a question :

name: Cancel non-latest runs
        uses: styfle/cancel-workflow-action@0.11.0
        with:
          all_but_latest: true
          access_token: ${{ github.token }}

I am a little confused about the github.token Further digging, I found that GITHUB_TOKEN is an environment variable that is generated during the workflow for read and write access. (https://devopsjournal.io/blog/2022/01/03/GitHub-Tokens)

I don't fully understand the point of this.

mavaylon1 commented 1 year ago

Note I did not go through the check list for the PR and this is still a draft. @oruebel @rly

mavaylon1 commented 1 year ago

The token is used to authenticate in the workflow job. Does this mean, we need to give the workflow permission to read the repository or in this case to cancel the workflow job if it is running already.

If I as a third party person (not dev) want to cancel a workflow, I cannot because I don't have permission. Is this just giving the workflow permission to cancel to workflow if it exists as if the workflow was a third party person?

mavaylon1 commented 1 year ago

the release notes are not updated since this is a workflow update and no release will be made.

rly commented 1 year ago

This workflow will checkout the dev branch of hdmf and the specific commit of hdmf-common-schema that is stored in hdmf, and then run the tests in tests/unit/common/ of hdmf. That does not test the current hdmf-common-schema version.

What I was looking for is a little more complicated than I initially imagined. Looking at how we do it in nwb-schema (https://github.com/NeurodataWithoutBorders/nwb-schema/blob/dev/.github/workflows/validate_schema.yml), we want to run the script validate_hdmf_spec core -m nwb.schema.json but replacing nwb.schema.json with a json schema for HDMF common. It looks like we never made that schema. But that file should be pretty much the same as https://github.com/NeurodataWithoutBorders/nwb-schema/blob/dev/nwb.schema.json except that 1) the top level metadata needs to be adjusted for HDMF common, 2) all instances of "neurodata_type" -> "data_type" and 3) rename it to hdmf-common.schema.json. I think that's it. Then because the schema lives in a directory "common" instead of "core", I think the command should be validate_hdmf_spec common -m hdmf-common.schema.json.

mavaylon1 commented 1 year ago

@rly what if I cd into hdmf_common_schema, git checkout PR branch for the schema, then ran the tests

rly commented 1 year ago

Then when the branch is merged, you have to update the workflow again so that you check out the current main branch of the schema instead of a PR branch.

That is fine to do, but I think independently of whether the dev branch of HDMF is compatible with the current PR or main branch of the schema, it would be good to run validate_hdmf_spec common -m hdmf-common.schema.json just like what we have in NWB schema, to catch validation issues with the schema that may not be caught by the tests in HDMF.

mavaylon1 commented 1 year ago

Then when the branch is merged, you have to update the workflow again so that you check out the current main branch of the schema instead of a PR branch.

That is fine to do, but I think independently of whether the dev branch of HDMF is compatible with the current PR or main branch of the schema, it would be good to run validate_hdmf_spec common -m hdmf-common.schema.json just like what we have in NWB schema, to catch validation issues with the schema that may not be caught by the tests in HDMF.

The reason why I went with this approach was because it would return an error if the schema wasn't compatible. With going the json route, does that record every instance where there's an issue? Because on second thought my way would only catch it one at a time. @rly

rly commented 1 year ago

The compatibility test is good.

With JSON validation, I think it will error out at the first error. But critically, just because the schema is compatible with the current API does not mean it is valid according to the schema language. For example, if we add a new data type to the schema, there will not be corresponding code that uses that data type in the dev branch of the API (so that test will pass), but the new data type definition might be invalid according to the rules of the language. We might then write API code for this new data type only to determine later that we wrote the schema in a way that is technically not valid.

Can you please also implement a workflow that does validate_hdmf_spec common -m hdmf-common.schema.json?

mavaylon1 commented 1 year ago

The compatibility test is good.

With JSON validation, I think it will error out at the first error. But critically, just because the schema is compatible with the current API does not mean it is valid according to the schema language. For example, if we add a new data type to the schema, there will not be corresponding code that uses that data type in the dev branch of the API (so that test will pass), but the new data type definition might be invalid according to the rules of the language. We might then write API code for this new data type only to determine later that we wrote the schema in a way that is technically not valid.

Can you please also implement a workflow that does validate_hdmf_spec common -m hdmf-common.schema.json?

I'll do both in this PR.

mavaylon1 commented 1 year ago

@rly could you do a review?

rly commented 1 year ago

Schema validation workflow looks good. Thanks.

The .github/workflows/hdmf_compatibility_schema.yml workflow confuses me. Currently it just checks out HDMF and runs a subset of tests from HDMF. It doesn't use the updated schema in the master branch or a pull request branch. I think it should.

The validate schema workflow was skipped. That's odd. Do you know why that happened?

mavaylon1 commented 1 year ago

I removed "push" as a trigger because "pull_request" should trigger on each update to the PR, i.e a commit. I unerstood the trigger push, pull_request, and pull request on forked repos. However, I see now that it actually ignores the pull request trigger on the current repo, hence the need for push.

As for the compatibility workflow, I forgot to add the checkout current branch part. I let that slip that's my bad. The idea is to make sure the changes in the schema on any PR shouldn't break HDMF and if it does then it is because an update needs to be made on HDMF

mavaylon1 commented 1 year ago

@rly I believe I made the needed changes! I need to slow down so this can be a quick check process.