15 provide automation of csv yaml conversion and validation

kaiiam / mifc

A minimum information standard checklist formalizing the description of food composition data and related metadata.

MIT License

2 stars 1 forks source link

15 provide automation of csv yaml conversion and validation #16

Closed turbomam closed 1 week ago

turbomam commented 1 week ago

some people would say that I included too many generated files in this PR. It makes it harder to review and will become problematic with respect to the size of the repo if the number of size of the example data files grows significantly. We could add these directories to .gitignore:

examples/output/
project/
src/mifc/datamodel/

Thanks to my contributions, your build process (make all-all) deviates from the make all offered in the initial cookiecutter. Please especially review

Makefile
project.Makefile

make all-all now includes examples-all, so any TSV file in src/data/examples/TSV/ will be included in the subsequent examples validation. Please note that the names of the files in src/data/examples/TSV/ must follow the this pattern:

{target class name}-{index slot name}-{anything else}.tsv

turbomam commented 1 week ago

I haven't included an automated YAML to TSV converter yet. Would you be interested in trying that based on tsv-data-to-yaml and src/data/examples/valid/%.yaml ?

turbomam commented 1 week ago

Did you make some changes to .github/workflows/deploy-docs.yaml ?

turbomam commented 1 week ago

I am going to make another commit in which I change the last command in .github/workflows/main.yaml from run: make test to make all-all. In the future, if you have many more or much larger examples data files, running that as part of every pull request check might be slower than you would like. (Although the build and check process will get slower as your schema gets more sophisticated anyway).

turbomam commented 1 week ago

I think we should scrutinize all of the GitHub actions in .github/workflows at some point. We could even bring in another LinkML developer to suggest changes to us, or to integrate some of the decisions we've made back into the cookiecutter.

turbomam commented 1 week ago

my last commit starts ignoring

examples/output/
project/
src/mifc/datamodel/

but that doesn't make this PR any smaller

kaiiam commented 1 week ago

Thanks @turbomam!

I included too many generated files in this PR

Maybe but I do appreciate the effort and I can still review it.

Did you make some changes to .github/workflows/deploy-docs.yaml ?

I might have but it's been a while the initial docs didn't work right away when I made it with the cookie cutter originally. I had to adjust some settings I though I'd documented it in an issue somewhere but not sure at the moment.

In the future, if you have many more or much larger examples data files ...

I guess the plan for the test data is to not have too many rows of data in the tsvs or to have too long of containers, but it's probably best to have a a bunch of tests for all the different attributes with small valid and invalid examples.

scrutinize all of the GitHub actions in .github/workflows

Are there potentially issues? Or is this to suggest to add to the cookie cutter? Happy for this repo to be a usecase for testing and improving LinkML as we go. Up to you @turbomam if you think it's worth brining in other LinkML people. I'm trusting your expertise.

kaiiam commented 1 week ago

@turbomam I'm also curious why src/mifc/datamodel/mifc.py got deleted?

kaiiam commented 1 week ago

When I ran make all-all it was fine, but I'm still not sure why src/mifc/datamodel/mifc.py seems to be deleted in the files changed? After running make all all I see it perhaps it's in the gitnore or something or I'm missing something in the files changes as it's quite a few for this PR.

turbomam commented 1 week ago

This PR removes src/mifc/datamodel/mifc.py as a policy of .gitignore, on the grounds that its derived from the schema. Its purely a judgement call. Feel free to revert that ignore rule or any of the other ones I did this morning.

I was thinking we could check Patrick Kalita's work in nmdc-schema regarding what to push to the main branch of thsi repo vs. what to bundle into a GH release or a PyPI release if you're going to make that.

kaiiam commented 1 week ago

Thanks yes, I'd like to make releases starting here and again later once we reboot or move this repo to a new organization. I'd be good to know what best practices for linkML releases should include. I'd like for example the output MIFC excel sheet so people can use it already.