Closed MBARIMike closed 8 years ago
The script is now used thusly:
compliance_report.py --pattern OS_MBARI-M2_20100402_R_TS http://dods.ndbc.noaa.gov/thredds/catalog/oceansites/DATA/MBARI/catalog.xml -v
Execution with '--format summary' looks like this:
$ time ./compliance_report.py --test cf acdd --format summary http://dods.ndbc.noaa.gov/thredds/catalog/oceansites/DATA/MBARI/catalog.xml
url,acdd,cf
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M0_20040604_R_M.nc,82,95
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M0_20040604_R_TS.nc,72,90
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M0_20041006_R_M.nc,82,95
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M0_20041006_R_TS.nc,72,90
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M0_20050418_R_M.nc,82,95
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M0_20050418_R_TS.nc,72,90
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M0_20060731_R_TS.nc,72,90
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M0_20060801_R_M.nc,82,89
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M0_20070130_R_M.nc,82,93
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M0_20070130_R_TS.nc,72,90
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M0_20070621_R_M.nc,82,95
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M0_20070621_R_TS.nc,72,90
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M0_20100614_R_M.nc,85,90
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M0_20100614_R_TS.nc,80,93
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_19700101_R_TS.nc,80,93
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_19751206_R_M.nc,83,88
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_19840614_R_TS.nc,80,93
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_20041021_R_M.nc,82,91
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_20041021_R_TS.nc,72,90
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_20051020_R_M.nc,82,91
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_20051020_R_TS.nc,70,90
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_20061012_R_M.nc,82,92
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_20061012_R_TS.nc,72,90
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_20071106_R_M.nc,82,91
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_20071106_R_TS.nc,72,90
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_20081008_R_M.nc,82,95
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_20081008_R_TS.nc,72,90
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_20091020_R_M.nc,83,89
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_20091020_R_TS.nc,80,93
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_20091118_R_TS.nc,80,93
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_20091121_R_TS.nc,80,93
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_20101027_R_M.nc,83,88
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_20101027_R_TS.nc,80,93
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_20120222_R_M.nc,83,88
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_20120222_R_TS.nc,80,93
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_20130918_R_M.nc,83,89
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_20130918_R_TS.nc,80,93
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_20140716_R_M.nc,82,90
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_20140716_R_TS.nc,80,93
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_20150729_R_M.nc,83,88
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_20150729_R_TS.nc,80,93
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M2_19700101_R_TS.nc,69,93
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M2_20040430_R_M.nc,82,92
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M2_20040430_R_TS.nc,72,90
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M2_20050520_R_M.nc,82,92
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M2_20050520_R_TS.nc,72,90
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M2_20060330_R_M.nc,82,92
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M2_20060330_R_TS.nc,72,90
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M2_20070425_R_M.nc,82,93
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M2_20070425_R_TS.nc,72,90
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M2_20080411_R_M.nc,82,91
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M2_20080411_R_TS.nc,72,90
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M2_20090429_R_M.nc,82,94
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M2_20090429_R_TS.nc,72,90
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M2_20100401_R_TS.nc,79,93
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M2_20100402_R_M.nc,82,89
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M2_20100402_R_TS.nc,80,93
real 3m50.105s
user 0m5.256s
sys 0m0.730s
The --format summary
output for an entire GDAC crawl can be imported into a spreadsheet for further analysis. @dpsnowden what do you think? Can you merge this PR?
When have time maybe I'll pull the output into Pandas and execute some groupby()s on it for display in a Jupyter Notebook that we can upload here.
compliance_report.py
is now about a billion times faster with BeautifulSoup/Requests than with thredds_crawler!
Three attempts to build a compliance report have failed after 2-3 hours of execution with IOErrors or ConnectionErrors with the longest execution (197 minutes) producing reports for only 3725 files. With over 31,000 files in the archive we'll need a better way of building reports in an environment of fragile network connections.
Thanks @MBARIMike . Even with the migration to BeatifulSoup/requests, the timeouts persist? If this were run at a GDAC by @jing-at-ndbc would the results be different?
Yes. Timeouts still happen after migration to BeatifulSoup/requests. It might be different with direct filesystem access, though I suspect that completing the crawl would take some time.
Addresses https://github.com/oceansites/dmt/issues/12.
Example execution: