Closed MBARIMike closed 6 years ago
Now trying to get this to work in Anaconda on a mac...
conda create --name compliance_report
source activate compliance_report
conda install -c anaconda netcdf4
while read requirement; do conda install -c conda-forge --yes $requirement; done < requirements.txt
conda install -c conda-forge --yes cython
Some of the above can probably be simplified, but now
python compliance_report.py --test cf acdd --format summary http://data.nodc.noaa.gov/thredds/catalog/ndbc/oceansites/DATA/MBARI/catalog.xml
gives a report:
...
https://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_20140716_R_TS.nc,80.9,93.8
https://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_20150729_R_M.nc,83.3,89.0
https://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_20150729_R_TS.nc,80.9,93.8
https://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_20150730_R_M.nc,83.3,94.1
https://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_20150730_R_TS.nc,80.9,93.8
...
I had Python 3.5 installed on my Mac via an Anaconda installation I did a year ago.
I updated it to Python 3.6 (the current one at the time) with:
conda update --prefix /Users/mccann/anaconda anaconda
It took a while.
Then I re-created my environment:
conda remove --name compliance_report --all
conda create --name compliance_report
source activate compliance_report
conda install -c conda-forge compliance-checker
conda install -c conda-forge beautifulsoup4
This took 10s of minutes.
Now, at least this compiles:
python compliance_report.py --test cf acdd --format summary -v "http://data.nodc.noaa.gov/thredds/catalog/ndbc/oceansites/DATA/catalog.xml"
Looks like this commit in the compliance-checker api broke compliance_report.py.
After this commit I now get this:
(compliance_report) medusa-3:compliance_report mccann$ python compliance_report.py --test cf:1.6 acdd --format summary "http://data.nodc.noaa.gov/thredds/catalog/ndbc/oceansites/DATA/MBARI/catalog.xml"
url,acdd,cf:1.6
http://data.nodc.noaa.gov/thredds/dodsC/ndbc/oceansites/DATA/MBARI/OS_MBARI-M2_20100402_R_TS.nc,78.9,99.1
http://data.nodc.noaa.gov/thredds/dodsC/ndbc/oceansites/DATA/MBARI/OS_MBARI-M2_20100402_R_M.nc,68.5,99.4
http://data.nodc.noaa.gov/thredds/dodsC/ndbc/oceansites/DATA/MBARI/OS_MBARI-M2_20100401_R_TS.nc,77.6,99.1
/Users/mccann/anaconda/envs/compliance_report/lib/python3.6/site-packages/compliance_checker/acdd.py:258: UserWarning: WARNING: valid_min not used since it
cannot be safely cast to variable data type
...
That warning message seems new...
It takes a loooonnnnggg time to crawl the OPeNDAP directories to produce these compliance reports. Following up on suggestion to use xarray and asyncio to speed things up.
Following the Kiel meeting, the script now works; see this Jupyter Notebook.
Like most all software compliance_report.py
is a work in progress. It can be improved by making it work in a parallel fashion using xarrray, dask, and asyncio. This will be tracked in a new issue.
@MBARIMike, in the past, I've tried parallelizing the Compliance Checker workloads but ran into some issues. I'm probably going to be making an issue regarding this. If you'd like to see enhanced support for multiple file datasets, please feel free to submit a feature request.
I tried this on a new VM with python 3.6:
and got this error:
Looks like there's some work to do...