ioos / compliance-checker

Python tool to check your datasets against compliance standards
http://ioos.github.io/compliance-checker/
Apache License 2.0
104 stars 56 forks source link

Feature: Allow custom CF url (-d) #1066

Closed jcermauwedu closed 1 month ago

jcermauwedu commented 2 months ago

Allow for a custom CF url for checking standards. Allows for checking files with a custom CF xml file with proposed changes to ensure operability prior to adoption.

Example of use:

$ compliance-checker -t gliderdac:3.0 -O gliderdac:no_ancillary_variables -d https://acoustics.fish.washington.edu/research/cf/cf-standard-name-table.xml gutils/rt/netcdf/unit_507_1712092237_20240402T211037Z_rt.nc
Downloading cf-standard-names table version "url specified" from: https://acoustics.fish.washington.edu/research/cf/cf-standard-name-table.xml
Running Compliance Checker on the datasets from: ['gutils/rt/netcdf/unit_507_1712092237_20240402T211037Z_rt.nc']

--------------------------------------------------------------------------------
                         IOOS Compliance Checker Report                         
                     Version 5.1.2.dev3+gc17f5f6.d20240501                      
                     Report generated 2024-05-01T06:03:43Z                      
                                 gliderdac:3.0                                  
      https://ioos.github.io/glider-dac/ngdac-netcdf-file-format-version-2      
--------------------------------------------------------------------------------
All tests passed!

This comes with a bug to track down: the program needs to be run twice. Once to download the custom XML and a second time to actually use it. So, there is actually a bug when requesting a different version of the CF standard names as well.

jcermauwedu commented 2 months ago

Will go back and tackle the lint errors.

codecov[bot] commented 2 months ago

Codecov Report

Attention: Patch coverage is 30.00000% with 7 lines in your changes are missing coverage. Please review.

Project coverage is 81.99%. Comparing base (064aac9) to head (9069b06). Report is 20 commits behind head on develop.

Files Patch % Lines
compliance_checker/cf/util.py 25.00% 5 Missing and 1 partial :warning:
compliance_checker/runner.py 0.00% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## develop #1066 +/- ## =========================================== - Coverage 82.18% 81.99% -0.20% =========================================== Files 24 24 Lines 5171 5181 +10 Branches 1242 1248 +6 =========================================== - Hits 4250 4248 -2 - Misses 622 633 +11 - Partials 299 300 +1 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

jcermauwedu commented 1 month ago

We will look into adding some more tests to satisfy codecov. In the meantime, the pre-commit.ci failure is in conflict with a local run of pre-commit.

Result from pre-commit.ci: PreCommit

Local run: localPreCommit

Will try: pre-commit.ci autofix to see what that does.

jcermauwedu commented 1 month ago

pre-commit.ci autofix

ocefpaf commented 1 month ago

@jcermauwedu I would ignore pre-commit-ci for now. It seems to be a bit wacky. I'll try to update/config it better in the near future. Let's focus on your change to the code base here so we can get this reviewed and merged ASAP.

jcermauwedu commented 1 month ago

Thank you for the feedback.

ocefpaf commented 1 month ago

@jcermauwedu would it make sense to also allow for local xml files? I don't have a use case for that but maybe we should not restrict it here to files on the web.

jcermauwedu commented 1 month ago

@jcermauwedu would it make sense to also allow for local xml files? I don't have a use case for that but maybe we should not restrict it here to files on the web.

This makes sense too. There is a lot of chicken and egg things going on here. The checker is initialized before the arguments are parsed. When the checker is initialized it has already read the standard names file. Then the arguments are processed and updates the file on the disk, but the test will use what has already been read in memory. IE: We have to find a way send a signal to the tests that the user desired a different standard name table. Or defer loading of the standard table in the class right up until the first test is run. Somewhere...

I was starting to implement a --standard-names-url option. I can add a --standard-names-file option for a local filesystem file. I can create a use case for both and tests.

I have reverted the other items and will put in a separate PR if needed. Pinning packages on the minimum side makes sense. I am also learning a bit about python coverage. This PR might extend into the sprint.

ocefpaf commented 1 month ago

There is a lot of chicken and egg things going on here. The checker is initialized before the arguments are parsed.

That is a bug IMO. If we can tackle that during the sprint, it is a success already. (Although we have a lot in our plates for this sprint already ;-p)

jcermauwedu commented 1 month ago

Turns out this feature already exists via an environment variable. We will revise the PR to provide additional documentation to highlight the feature. The code could be improved to see if there is an http or https prefix to attempt a remote url fetch.

$ export CF_STANDARD_NAME_TABLE=cf-standard-name-table.xml
$ compliance-checker -t gliderdac:3.0 -O gliderdac:no_ancillary_variables unit_507_1715633718_20240513T205518Z_rt.nc
Running Compliance Checker on the datasets from: ['unit_507_1715633718_20240513T205518Z_rt.nc']

--------------------------------------------------------------------------------
                         IOOS Compliance Checker Report                         
                     Version 5.1.2.dev9+g9069b06.d20240516                      
                     Report generated 2024-05-16T01:31:13Z                      
                                 gliderdac:3.0                                  
      https://ioos.github.io/glider-dac/ngdac-netcdf-file-format-version-2      
--------------------------------------------------------------------------------
All tests passed!
benjwadams commented 1 month ago

Do we want to close this as not planned in favor of documentation?