oceansites / dmt

Activities of the OceanSITES Data Management Team
http://www.oceansites.org/data
6 stars 1 forks source link

Initial thredds_crawler compliance-checker utility #29

Closed MBARIMike closed 8 years ago

MBARIMike commented 8 years ago

Addresses https://github.com/oceansites/dmt/issues/12.

Example execution:

(venv-stoqs) [vagrant@localhost utilities]$ ./site_report.py
2016-04-30 10:43:54,883 - [INFO] Crawling: http://dods.ndbc.noaa.gov/thredds/catalog/oceansites/DATA/MBARI/catalog.xml
2016-04-30 10:43:55,277 - [INFO] Crawling: http://dods.ndbc.noaa.gov/thredds/catalog/oceansites/DATA/MBARI/metadata/catalog.xml
2016-04-30 10:43:55,542 - [INFO] Ignoring dataset based on 'selects'.  ID: oceansites/DATA/MBARI/OS_MBARI-M0_20040604_R_M.nc
2016-04-30 10:43:55,542 - [INFO] Ignoring dataset based on 'selects'.  ID: oceansites/DATA/MBARI/OS_MBARI-M0_20040604_R_TS.nc
2016-04-30 10:43:55,542 - [INFO] Ignoring dataset based on 'selects'.  ID: oceansites/DATA/MBARI/OS_MBARI-M0_20041006_R_M.nc
2016-04-30 10:43:55,542 - [INFO] Ignoring dataset based on 'selects'.  ID: oceansites/DATA/MBARI/OS_MBARI-M0_20041006_R_TS.nc
2016-04-30 10:43:55,543 - [INFO] Ignoring dataset based on 'selects'.  ID: oceansites/DATA/MBARI/OS_MBARI-M0_20050418_R_M.nc
2016-04-30 10:43:55,543 - [INFO] Ignoring dataset based on 'selects'.  ID: oceansites/DATA/MBARI/OS_MBARI-M0_20050418_R_TS.nc
2016-04-30 10:43:55,543 - [INFO] Ignoring dataset based on 'selects'.  ID: oceansites/DATA/MBARI/OS_MBARI-M0_20060731_R_TS.nc
2016-04-30 10:43:55,543 - [INFO] Ignoring dataset based on 'selects'.  ID: oceansites/DATA/MBARI/OS_MBARI-M0_20060801_R_M.nc
2016-04-30 10:43:55,543 - [INFO] Ignoring dataset based on 'selects'.  ID: oceansites/DATA/MBARI/OS_MBARI-M0_20070130_R_M.nc
2016-04-30 10:43:55,544 - [INFO] Ignoring dataset based on 'selects'.  ID: oceansites/DATA/MBARI/OS_MBARI-M0_20070130_R_TS.nc
2016-04-30 10:43:55,544 - [INFO] Ignoring dataset based on 'selects'.  ID: oceansites/DATA/MBARI/OS_MBARI-M0_20070621_R_M.nc
2016-04-30 10:43:55,544 - [INFO] Ignoring dataset based on 'selects'.  ID: oceansites/DATA/MBARI/OS_MBARI-M0_20070621_R_TS.nc
2016-04-30 10:43:55,544 - [INFO] Ignoring dataset based on 'selects'.  ID: oceansites/DATA/MBARI/OS_MBARI-M0_20100614_R_M.nc
2016-04-30 10:43:55,544 - [INFO] Ignoring dataset based on 'selects'.  ID: oceansites/DATA/MBARI/OS_MBARI-M0_20100614_R_TS.nc
2016-04-30 10:43:55,544 - [INFO] Ignoring dataset based on 'selects'.  ID: oceansites/DATA/MBARI/OS_MBARI-M1_19700101_R_TS.nc
2016-04-30 10:43:55,545 - [INFO] Ignoring dataset based on 'selects'.  ID: oceansites/DATA/MBARI/OS_MBARI-M1_19751206_R_M.nc
2016-04-30 10:43:55,545 - [INFO] Ignoring dataset based on 'selects'.  ID: oceansites/DATA/MBARI/OS_MBARI-M1_19840614_R_TS.nc
2016-04-30 10:43:55,545 - [INFO] Ignoring dataset based on 'selects'.  ID: oceansites/DATA/MBARI/OS_MBARI-M1_20041021_R_M.nc
2016-04-30 10:43:55,545 - [INFO] Ignoring dataset based on 'selects'.  ID: oceansites/DATA/MBARI/OS_MBARI-M1_20041021_R_TS.nc
2016-04-30 10:43:55,545 - [INFO] Ignoring dataset based on 'selects'.  ID: oceansites/DATA/MBARI/OS_MBARI-M1_20051020_R_M.nc
2016-04-30 10:43:55,545 - [INFO] Ignoring dataset based on 'selects'.  ID: oceansites/DATA/MBARI/OS_MBARI-M1_20051020_R_TS.nc
2016-04-30 10:43:55,546 - [INFO] Ignoring dataset based on 'selects'.  ID: oceansites/DATA/MBARI/OS_MBARI-M1_20061012_R_M.nc
2016-04-30 10:43:55,546 - [INFO] Ignoring dataset based on 'selects'.  ID: oceansites/DATA/MBARI/OS_MBARI-M1_20061012_R_TS.nc
2016-04-30 10:43:55,546 - [INFO] Ignoring dataset based on 'selects'.  ID: oceansites/DATA/MBARI/OS_MBARI-M1_20071106_R_M.nc
2016-04-30 10:43:55,546 - [INFO] Ignoring dataset based on 'selects'.  ID: oceansites/DATA/MBARI/OS_MBARI-M1_20071106_R_TS.nc
2016-04-30 10:43:55,546 - [INFO] Ignoring dataset based on 'selects'.  ID: oceansites/DATA/MBARI/OS_MBARI-M1_20081008_R_M.nc
2016-04-30 10:43:55,546 - [INFO] Ignoring dataset based on 'selects'.  ID: oceansites/DATA/MBARI/OS_MBARI-M1_20081008_R_TS.nc
2016-04-30 10:43:55,547 - [INFO] Ignoring dataset based on 'selects'.  ID: oceansites/DATA/MBARI/OS_MBARI-M1_20091020_R_M.nc
2016-04-30 10:43:55,547 - [INFO] Ignoring dataset based on 'selects'.  ID: oceansites/DATA/MBARI/OS_MBARI-M1_20091020_R_TS.nc
2016-04-30 10:43:55,547 - [INFO] Ignoring dataset based on 'selects'.  ID: oceansites/DATA/MBARI/OS_MBARI-M1_20091118_R_TS.nc
2016-04-30 10:43:55,547 - [INFO] Ignoring dataset based on 'selects'.  ID: oceansites/DATA/MBARI/OS_MBARI-M1_20091121_R_TS.nc
2016-04-30 10:43:55,547 - [INFO] Ignoring dataset based on 'selects'.  ID: oceansites/DATA/MBARI/OS_MBARI-M1_20101027_R_M.nc
2016-04-30 10:43:55,548 - [INFO] Ignoring dataset based on 'selects'.  ID: oceansites/DATA/MBARI/OS_MBARI-M1_20101027_R_TS.nc
2016-04-30 10:43:55,548 - [INFO] Ignoring dataset based on 'selects'.  ID: oceansites/DATA/MBARI/OS_MBARI-M1_20120222_R_M.nc
2016-04-30 10:43:55,549 - [INFO] Ignoring dataset based on 'selects'.  ID: oceansites/DATA/MBARI/OS_MBARI-M1_20120222_R_TS.nc
2016-04-30 10:43:55,549 - [INFO] Ignoring dataset based on 'selects'.  ID: oceansites/DATA/MBARI/OS_MBARI-M1_20130918_R_M.nc
2016-04-30 10:43:55,549 - [INFO] Ignoring dataset based on 'selects'.  ID: oceansites/DATA/MBARI/OS_MBARI-M1_20130918_R_TS.nc
2016-04-30 10:43:55,549 - [INFO] Ignoring dataset based on 'selects'.  ID: oceansites/DATA/MBARI/OS_MBARI-M1_20140716_R_M.nc
2016-04-30 10:43:55,549 - [INFO] Ignoring dataset based on 'selects'.  ID: oceansites/DATA/MBARI/OS_MBARI-M1_20140716_R_TS.nc
2016-04-30 10:43:55,549 - [INFO] Ignoring dataset based on 'selects'.  ID: oceansites/DATA/MBARI/OS_MBARI-M1_20150729_R_M.nc
2016-04-30 10:43:55,550 - [INFO] Ignoring dataset based on 'selects'.  ID: oceansites/DATA/MBARI/OS_MBARI-M1_20150729_R_TS.nc
2016-04-30 10:43:55,550 - [INFO] Ignoring dataset based on 'selects'.  ID: oceansites/DATA/MBARI/OS_MBARI-M2_19700101_R_TS.nc
2016-04-30 10:43:55,550 - [INFO] Ignoring dataset based on 'selects'.  ID: oceansites/DATA/MBARI/OS_MBARI-M2_20040430_R_M.nc
2016-04-30 10:43:55,550 - [INFO] Ignoring dataset based on 'selects'.  ID: oceansites/DATA/MBARI/OS_MBARI-M2_20040430_R_TS.nc
2016-04-30 10:43:55,550 - [INFO] Ignoring dataset based on 'selects'.  ID: oceansites/DATA/MBARI/OS_MBARI-M2_20050520_R_M.nc
2016-04-30 10:43:55,551 - [INFO] Ignoring dataset based on 'selects'.  ID: oceansites/DATA/MBARI/OS_MBARI-M2_20050520_R_TS.nc
2016-04-30 10:43:55,551 - [INFO] Ignoring dataset based on 'selects'.  ID: oceansites/DATA/MBARI/OS_MBARI-M2_20060330_R_M.nc
2016-04-30 10:43:55,551 - [INFO] Ignoring dataset based on 'selects'.  ID: oceansites/DATA/MBARI/OS_MBARI-M2_20060330_R_TS.nc
2016-04-30 10:43:55,551 - [INFO] Ignoring dataset based on 'selects'.  ID: oceansites/DATA/MBARI/OS_MBARI-M2_20070425_R_M.nc
2016-04-30 10:43:55,551 - [INFO] Ignoring dataset based on 'selects'.  ID: oceansites/DATA/MBARI/OS_MBARI-M2_20070425_R_TS.nc
2016-04-30 10:43:55,551 - [INFO] Ignoring dataset based on 'selects'.  ID: oceansites/DATA/MBARI/OS_MBARI-M2_20080411_R_M.nc
2016-04-30 10:43:55,552 - [INFO] Ignoring dataset based on 'selects'.  ID: oceansites/DATA/MBARI/OS_MBARI-M2_20080411_R_TS.nc
2016-04-30 10:43:55,552 - [INFO] Ignoring dataset based on 'selects'.  ID: oceansites/DATA/MBARI/OS_MBARI-M2_20090429_R_M.nc
2016-04-30 10:43:55,552 - [INFO] Ignoring dataset based on 'selects'.  ID: oceansites/DATA/MBARI/OS_MBARI-M2_20090429_R_TS.nc
2016-04-30 10:43:55,553 - [INFO] Ignoring dataset based on 'selects'.  ID: oceansites/DATA/MBARI/OS_MBARI-M2_20100401_R_TS.nc
2016-04-30 10:43:55,553 - [INFO] Ignoring dataset based on 'selects'.  ID: oceansites/DATA/MBARI/OS_MBARI-M2_20100402_R_M.nc
2016-04-30 10:43:55,553 - [DEBUG] Processing oceansites/DATA/MBARI/OS_MBARI-M2_20100402_R_TS.nc

--------------------------------------------------------------------------------
                     The dataset scored 49 out of 68 points
                             during the acdd check
--------------------------------------------------------------------------------
                               Scoring Breakdown:

                                 High Priority
--------------------------------------------------------------------------------
    Name                            :Priority: Score
keywords                                :3:     1/1
summary                                 :3:     1/1
title                                   :3:     1/1
varattr                                 :3:    25/36

                                Medium Priority
--------------------------------------------------------------------------------
    Name                            :Priority: Score
acknowledgment                          :2:     0/1
cdm_data_type                           :2:     2/2
comment                                 :2:     0/1
creator_email                           :2:     0/1
creator_name                            :2:     1/1
creator_url                             :2:     0/1
date_created                            :2:     1/1
geospatial_lat_max                      :2:     1/1
geospatial_lat_min                      :2:     1/1
geospatial_lon_max                      :2:     1/1
geospatial_lon_min                      :2:     1/1
geospatial_vertical_max                 :2:     1/1
geospatial_vertical_min                 :2:     1/1
history                                 :2:     1/1
id                                      :2:     1/1
institution                             :2:     1/1
keywords_vocabulary                     :2:     1/1
license                                 :2:     0/1
naming_authority                        :2:     1/1
processing_level                        :2:     0/1
project                                 :2:     1/1
standard_name_vocabulary                :2:     1/1
time_coverage_duration                  :2:     0/1
time_coverage_end                       :2:     1/1
time_coverage_extents_match             :2:     2/2
time_coverage_resolution                :2:     0/1
time_coverage_start                     :2:     1/1

--------------------------------------------------------------------------------
                  Reasoning for the failed tests given below:

Name                             Priority:     Score:Reasoning
--------------------------------------------------------------------------------
varattr                                :3:    25/36 :
    DEPTH_QC                           :3:     1/ 3 :
        var_std_name                   :3:     1/ 2 : Var DEPTH_QC missing attr
                                                      standard_name
        var_units                      :3:     0/ 1 : Var DEPTH_QC missing attr
                                                      units
    DEPTH_bnds                         :3:     0/ 3 :
        var_std_name                   :3:     0/ 2 : Var DEPTH_bnds missing
                                                      attr long_name, Var
                                                      DEPTH_bnds missing attr
                                                      standard_name
        var_units                      :3:     0/ 1 : Var DEPTH_bnds missing
                                                      attr units
    POSITION_QC                        :3:     1/ 3 :
        var_std_name                   :3:     1/ 2 : Var POSITION_QC missing
                                                      attr standard_name
        var_units                      :3:     0/ 1 : Var POSITION_QC missing
                                                      attr units
    PSAL_QC                            :3:     2/ 3 :
        var_std_name                   :3:     1/ 2 : Var PSAL_QC missing attr
                                                      standard_name
    TEMP_QC                            :3:     2/ 3 :
        var_std_name                   :3:     1/ 2 : Var TEMP_QC missing attr
                                                      standard_name
    TIME_QC                            :3:     1/ 3 :
        var_std_name                   :3:     1/ 2 : Var TIME_QC missing attr
                                                      standard_name
        var_units                      :3:     0/ 1 : Var TIME_QC missing attr
                                                      units
acknowledgment                         :2:     0/ 1 : Attr acknowledgment not
                                                      present
comment                                :2:     0/ 1 : Attr comment not present
creator_email                          :2:     0/ 1 : Attr creator_email not
                                                      present
creator_url                            :2:     0/ 1 : Attr creator_url not
                                                      present
license                                :2:     0/ 1 : Attr license not present
processing_level                       :2:     0/ 1 : Attr processing_level not
                                                      present
time_coverage_duration                 :2:     0/ 1 : Attr
                                                      time_coverage_duration not
                                                      present
time_coverage_resolution               :2:     0/ 1 : Attr
                                                      time_coverage_resolution
                                                      not present
contributor_name                       :1:     0/ 1 : Attr contributor_name not
                                                      present
contributor_role                       :1:     0/ 1 : Attr contributor_role not
                                                      present
date_issued                            :1:     0/ 1 : Attr date_issued not
                                                      present
date_modified                          :1:     0/ 1 : Attr date_modified not
                                                      present
geospatial_lat_resolution              :1:     0/ 1 : Attr
                                                      geospatial_lat_resolution
                                                      not present
geospatial_lat_units                   :1:     0/ 1 : Attr geospatial_lat_units
                                                      not present
geospatial_lon_resolution              :1:     0/ 1 : Attr
                                                      geospatial_lon_resolution
                                                      not present
geospatial_lon_units                   :1:     0/ 1 : Attr geospatial_lon_units
                                                      not present
geospatial_vertical_resolution         :1:     0/ 1 : Attr
                                                      geospatial_vertical_resolu
                                                      tion not present
geospatial_vertical_units              :1:     0/ 1 : Attr
                                                      geospatial_vertical_units
                                                      not present
publisher_email                        :1:     0/ 1 : Attr publisher_email not
                                                      present
publisher_name                         :1:     0/ 1 : Attr publisher_name not
                                                      present
publisher_url                          :1:     0/ 1 : Attr publisher_url not
                                                      present

--------------------------------------------------------------------------------
                     The dataset scored 82 out of 91 points
                              during the cf check
--------------------------------------------------------------------------------
                               Scoring Breakdown:

                                 High Priority
--------------------------------------------------------------------------------
    Name                            :Priority: Score
§2.2 Valid netCDF data types            :3:    12/12
§2.3 Legal variable names               :3:    12/12
§2.4 Unique dimensions                  :3:    12/12
§2.6.1 Global Attribute Conventions inc :3:     0/1
§2.6.2 Convention Attributes            :3:     2/2
§3.1 Variables contain valid CF Units   :3:     0/4
§3.1 Variables contain valid units for  :3:     4/6
§3.3 Standard Names                     :3:     6/6
§4 Axis attributes and coordinate varia :3:     8/8
§4 Coordinate Variable latitude contain :3:     1/1
§4 Coordinate Variable longitude contai :3:     1/1
§4.3 Vertical coordinates contain valid :3:     2/2
§4.3.1 Vertical dimension coordinates c :3:     1/1
§4.4 Time coordinate variable and attri :3:     2/2

                                Medium Priority
--------------------------------------------------------------------------------
    Name                            :Priority: Score
all_features_are_same_type              :2:     0/0
contiguous_ragged_array                 :2:     0/0
coordinates_and_metadata                :2:     0/0
feature_type                            :2:     0/0
incomplete_multidim_array               :2:     0/0
indexed_ragged_array                    :2:     0/0
missing_data                            :2:     0/0
orthogonal_multidim_array               :2:     0/0
§4 Coordinate Variables                 :2:     3/3
§4.1 Coordinates representing latitude  :2:     3/3
§4.1 Coordinates representing longitude :2:     3/3
§5.1 Geophysical variables contain vali :2:     8/8
§7.3 Cell Methods                       :2:     2/4

--------------------------------------------------------------------------------
                  Reasoning for the failed tests given below:

Name                             Priority:     Score:Reasoning
--------------------------------------------------------------------------------
§2.6.1 Global Attribute Conventions inc:3:     0/ 1 : Conventions field is not
                                                      "CF-1.6"
§3.1 Variables contain valid CF Units  :3:     0/ 4 : unknown units type (None)
                                                      for DEPTH_bnds, unknown
                                                      units type (None) for
                                                      TIME_QC, unknown units
                                                      type (None) for
                                                      POSITION_QC, unknown units
                                                      type (None) for DEPTH_QC
§3.1 Variables contain valid units for :3:     4/ 6 : units are DAYS since
                                                      1950-01-01 00:00:00,
                                                      standard_name units should
                                                      be s, units are  ,
                                                      standard_name units should
                                                      be 1e-3
§7.3 Cell Methods                      :2:     2/ 4 :
    PSAL                               :2:     1/ 2 :
        cell_methods_name              :2:     0/ 1 : The name field does not
                                                      match a dimension, area or
                                                      coordinate.
    TEMP                               :2:     1/ 2 :
        cell_methods_name              :2:     0/ 1 : The name field does not
                                                      match a dimension, area or
                                                      coordinate.
§4.4.1 Time and calendar               :1:     0/ 1 : Variable TIME should have
                                                      a calendar attribute
WARNING: The following exceptions occured during the acdd checker (possibly indicate compliance checker issues):
acdd.check_lat_extents: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
acdd.check_vertical_extents: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
acdd.check_lon_extents: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
WARNING: The following exceptions occured during the cf checker (possibly indicate compliance checker issues):
cf.check_fill_value_outside_valid_range:
MBARIMike commented 8 years ago

The script is now used thusly:

compliance_report.py --pattern OS_MBARI-M2_20100402_R_TS http://dods.ndbc.noaa.gov/thredds/catalog/oceansites/DATA/MBARI/catalog.xml -v
MBARIMike commented 8 years ago

Execution with '--format summary' looks like this:

$ time ./compliance_report.py --test cf acdd --format summary http://dods.ndbc.noaa.gov/thredds/catalog/oceansites/DATA/MBARI/catalog.xml
url,acdd,cf
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M0_20040604_R_M.nc,82,95
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M0_20040604_R_TS.nc,72,90
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M0_20041006_R_M.nc,82,95
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M0_20041006_R_TS.nc,72,90
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M0_20050418_R_M.nc,82,95
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M0_20050418_R_TS.nc,72,90
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M0_20060731_R_TS.nc,72,90
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M0_20060801_R_M.nc,82,89
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M0_20070130_R_M.nc,82,93
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M0_20070130_R_TS.nc,72,90
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M0_20070621_R_M.nc,82,95
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M0_20070621_R_TS.nc,72,90
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M0_20100614_R_M.nc,85,90
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M0_20100614_R_TS.nc,80,93
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_19700101_R_TS.nc,80,93
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_19751206_R_M.nc,83,88
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_19840614_R_TS.nc,80,93
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_20041021_R_M.nc,82,91
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_20041021_R_TS.nc,72,90
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_20051020_R_M.nc,82,91
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_20051020_R_TS.nc,70,90
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_20061012_R_M.nc,82,92
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_20061012_R_TS.nc,72,90
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_20071106_R_M.nc,82,91
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_20071106_R_TS.nc,72,90
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_20081008_R_M.nc,82,95
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_20081008_R_TS.nc,72,90
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_20091020_R_M.nc,83,89
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_20091020_R_TS.nc,80,93
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_20091118_R_TS.nc,80,93
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_20091121_R_TS.nc,80,93
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_20101027_R_M.nc,83,88
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_20101027_R_TS.nc,80,93
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_20120222_R_M.nc,83,88
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_20120222_R_TS.nc,80,93
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_20130918_R_M.nc,83,89
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_20130918_R_TS.nc,80,93
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_20140716_R_M.nc,82,90
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_20140716_R_TS.nc,80,93
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_20150729_R_M.nc,83,88
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M1_20150729_R_TS.nc,80,93
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M2_19700101_R_TS.nc,69,93
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M2_20040430_R_M.nc,82,92
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M2_20040430_R_TS.nc,72,90
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M2_20050520_R_M.nc,82,92
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M2_20050520_R_TS.nc,72,90
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M2_20060330_R_M.nc,82,92
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M2_20060330_R_TS.nc,72,90
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M2_20070425_R_M.nc,82,93
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M2_20070425_R_TS.nc,72,90
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M2_20080411_R_M.nc,82,91
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M2_20080411_R_TS.nc,72,90
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M2_20090429_R_M.nc,82,94
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M2_20090429_R_TS.nc,72,90
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M2_20100401_R_TS.nc,79,93
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M2_20100402_R_M.nc,82,89
http://dods.ndbc.noaa.gov/thredds/dodsC/oceansites/DATA/MBARI/OS_MBARI-M2_20100402_R_TS.nc,80,93

real    3m50.105s
user    0m5.256s
sys 0m0.730s
MBARIMike commented 8 years ago

The --format summary output for an entire GDAC crawl can be imported into a spreadsheet for further analysis. @dpsnowden what do you think? Can you merge this PR?

When have time maybe I'll pull the output into Pandas and execute some groupby()s on it for display in a Jupyter Notebook that we can upload here.

MBARIMike commented 8 years ago

compliance_report.py is now about a billion times faster with BeautifulSoup/Requests than with thredds_crawler!

MBARIMike commented 8 years ago

Three attempts to build a compliance report have failed after 2-3 hours of execution with IOErrors or ConnectionErrors with the longest execution (197 minutes) producing reports for only 3725 files. With over 31,000 files in the archive we'll need a better way of building reports in an environment of fragile network connections.

dpsnowden commented 8 years ago

Thanks @MBARIMike . Even with the migration to BeatifulSoup/requests, the timeouts persist? If this were run at a GDAC by @jing-at-ndbc would the results be different?

MBARIMike commented 8 years ago

Yes. Timeouts still happen after migration to BeatifulSoup/requests. It might be different with direct filesystem access, though I suspect that completing the crawl would take some time.