Open emiliom opened 8 years ago
ncISO is a command-line utility to automate metadata analysis and ISO metadata generation for THREDDS catalogs (Ref. NOAA)
The example below has been tested with java versions listed below:
java version "1.8.0_91"
Java(TM) SE Runtime Environment (build 1.8.0_91-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.91-b14, mixed mode)
java version "1.7.0_111"
OpenJDK Runtime Environment (IcedTea 2.6.7) (7u111-2.6.7-0ubuntu0.14.04.3)
OpenJDK 64-Bit Server VM (build 24.111-b01, mixed mode)
catalog.xml
file that follows THREDDS conventions]java -jar ncISO-2.3.jar
to see available arguments.java -Xms1024m -Xmx1024m -jar ncISO-2.3.jar -ts http://ona.coas.oregonstate.edu:8080/thredds/catalog.xml?dataset=OCOS -num 1 -depth 20 -iso true
to retrieve metada from the OSU ROMS dataset endpoint–Xms1024m and –Xmx1024m: Standard java elements for specifying the amount of memory to allocate to the ncISO utility. In this case 1024 megaBytes are specified for initial and maximum memory.
–ts THREDDS_CATALOG_URL: specifies the URL of the THREDDS catalog to process.
–num N: specifies the number of datasets to process per branch. Specifying a small number of datasets/branch, as in this case, results in a fast sample scan that is representative in THREDDS catalogs with generally homogeneous content in each branch. Specify a large number for a translation of all content.
–depth 20: limits the crawlers descent into the catalog.
–iso: signals to the crawler to generate ISO.
–waf ROOT_WAF_FOLDER: signals the crawler to dump files to a flat WAF structure.
–custom: signals to the crawler to translate the NCML using a custom stylesheet.
–xslt: XSLT_FILENAME located in an xslt subfolder.
@lsetiawan, please do a comparison (qualitative, I guess?) between what you got with ncISO on the CMOP THREDDS dataset, and this record: http://data.nanoos.org/metadata/ioos/thredds/thredds_dodsC_model_data_forecast.xml Let me know what you find -- either in person later this week, or via this issue.
@lsetiawan, please run stand-alone nciso against this additional service: http://ingria.coas.oregonstate.edu/opendap/aggregated/catalog.xml
This is not THREDDS; it's a similar server application called "Hyrax". But apparently it generates a catalog.xml
file that follows THREDDS conventions.
Please name the output iso xml osuroms_hyrax_aggregation.xml
; take a quick look just to see that it "looks normal", then if it's ok load it into our WAF at http://data.nanoos.org/metadata/ioos/thredds/
@emiliom Regarding https://github.com/nanoos-pnw/ioos-ws/issues/2#issuecomment-261129635:
I have compared the result. They both seems similar though forecast_iso.xml
contains the following differences:
<gmd:spatialRepresentationInfo>
<gmd:MD_GridSpatialRepresentation>
<gmd:numberOfDimensions>
<gco:Integer>4</gco:Integer>
</gmd:numberOfDimensions>
<gmd:axisDimensionProperties>
<gmd:MD_Dimension>
<gmd:dimensionName>
<gmd:MD_DimensionNameTypeCode codeList="http://www.ngdc.noaa.gov/metadata/published/xsd/schema/resources/Codelist/gmxCodelists.xml#MD_DimensionNameTypeCode"
codeListValue="column">column</gmd:MD_DimensionNameTypeCode>
</gmd:dimensionName>
<gmd:dimensionSize gco:nilReason="unknown"/>
<gmd:resolution>
<gco:Measure uom="degrees_east">-0.6513596482657548</gco:Measure>
</gmd:resolution>
</gmd:MD_Dimension>
</gmd:axisDimensionProperties>
<gmd:axisDimensionProperties>
<gmd:MD_Dimension>
<gmd:dimensionName>
<gmd:MD_DimensionNameTypeCode codeList="http://www.ngdc.noaa.gov/metadata/published/xsd/schema/resources/Codelist/gmxCodelists.xml#MD_DimensionNameTypeCode"
codeListValue="row">row</gmd:MD_DimensionNameTypeCode>
</gmd:dimensionName>
<gmd:dimensionSize gco:nilReason="unknown"/>
<gmd:resolution>
<gco:Measure uom="degrees_north">1.1557768657257572E-5</gco:Measure>
</gmd:resolution>
</gmd:MD_Dimension>
</gmd:axisDimensionProperties>
<gmd:EX_Extent id="boundingExtent">
<gmd:geographicElement>
<gmd:EX_GeographicBoundingBox id="boundingGeographicBoundingBox">
<gmd:extentTypeCode>
<gco:Boolean>1</gco:Boolean>
</gmd:extentTypeCode>
<gmd:westBoundLongitude>
<gco:Decimal>-124.17768859863281</gco:Decimal>
</gmd:westBoundLongitude>
<gmd:eastBoundLongitude>
<gco:Decimal>-123.29158782958984</gco:Decimal>
</gmd:eastBoundLongitude>
<gmd:southBoundLatitude>
<gco:Decimal>46.029273986816406</gco:Decimal>
</gmd:southBoundLatitude>
<gmd:northBoundLatitude>
<gco:Decimal>46.3841552734375</gco:Decimal>
</gmd:northBoundLatitude>
</gmd:EX_GeographicBoundingBox>
</gmd:geographicElement>
<gmd:temporalElement>
<gmd:EX_TemporalExtent id="boundingTemporalExtent">
<gmd:extent>
<gml:TimePeriod gml:id="d3">
<gml:description>seconds</gml:description>
<gml:beginPosition>2016-11-17T08:15:00Z</gml:beginPosition>
<gml:endPosition>2016-11-20T08:00:00Z</gml:endPosition>
</gml:TimePeriod>
</gmd:extent>
</gmd:EX_TemporalExtent>
</gmd:temporalElement>
</gmd:EX_Extent>
<srv:serviceType>
<gco:LocalName>OPeNDAP:OPeNDAP</gco:LocalName>
</srv:serviceType>
<srv:extent>
<gmd:EX_Extent>
<gmd:geographicElement>
<gmd:EX_GeographicBoundingBox>
<gmd:extentTypeCode>
<gco:Boolean>1</gco:Boolean>
</gmd:extentTypeCode>
<gmd:westBoundLongitude>
<gco:Decimal>-124.17768859863281</gco:Decimal>
</gmd:westBoundLongitude>
<gmd:eastBoundLongitude>
<gco:Decimal>-123.29158782958984</gco:Decimal>
</gmd:eastBoundLongitude>
<gmd:southBoundLatitude>
<gco:Decimal>46.029273986816406</gco:Decimal>
</gmd:southBoundLatitude>
<gmd:northBoundLatitude>
<gco:Decimal>46.3841552734375</gco:Decimal>
</gmd:northBoundLatitude>
</gmd:EX_GeographicBoundingBox>
</gmd:geographicElement>
<gmd:temporalElement>
<gmd:EX_TemporalExtent>
<gmd:extent>
<gml:TimePeriod gml:id="d3e62">
<gml:beginPosition>2016-11-17T08:15:00Z</gml:beginPosition>
<gml:endPosition>2016-11-20T08:00:00Z</gml:endPosition>
</gml:TimePeriod>
</gmd:extent>
</gmd:EX_TemporalExtent>
</gmd:temporalElement>
</gmd:EX_Extent>
</srv:extent>
<gmd:identificationInfo>
<srv:SV_ServiceIdentification id="OGC-WMS">
<gmd:citation>
<gmd:CI_Citation>
<gmd:title>
<gco:CharacterString>CMOP Virtual Columbia River (SELFE); f33</gco:CharacterString>
</gmd:title>
<gmd:date gco:nilReason="missing"/>
@emiliom Regarding https://github.com/nanoos-pnw/ioos-ws/issues/2#issuecomment-261395271:
I have uploaded the output osuroms_hyrax_aggregation.xml
into our WAF at http://data.nanoos.org/metadata/ioos/thredds/ after giving a quick look, which seemed fine.
P.S. Sorry I did this late, just saw it 20 minutes ago, haven't looked at emails past 2 days. Thanks.
Lots of relevant discussions going on at this ioos registry thread about nciso, including the stand-alone nciso, the XSLT transformations, bugs, etc. Let's go over it when we're ready to focus on operationalizing our stand-alone nciso harvesting.
Reminder: all 3 nciso output xml files go into our WAF at http://data.nanoos.org/metadata/ioos/thredds/
Running the ncISO like before: java -Xms1024m -Xmx1024m -jar ncISO-2.3.jar -ts http://amb6400b.stccmop.org:8080/thredds/forecast_model_data.xml -num 1 -depth 20 -iso true -custom true
gives out error:
$ java -Xms1024m -Xmx1024m -jar ncISO-2.3.jar -ts http://amb6400b.stccmop.org:8080/thredds/forecast_model_data.xml -num 1 -depth 20 -iso true
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Error on line 1 column 50 of UnidataDDCount-HTML.xsl:
SXXP0003: Error reported by XML parser: White spaces are required between publicId and systemId.
Based on the ioos registry thread the solution is to use https rather than just http. This was suggested by dneufeldcu The XSLTs are no longer served over http they need to be accessed via https. This might be a good time to switch over to XSLTs hosted on the Unidata github site as well.
Changing the url to https runs without errors but does not output anything other than the folders.. Following on the issues ebridge suggested to use thredds_crawler based program rather than ncISO.jar
It seems like ioos has the thredds_crawler python library and it is already in conda
Hmm. I think dneufeldcu
was referring to a context that's specific to NOAA resources. The feds last month went through a large transition to https
for many data services and web sites. That would have no effect on a non-fed endpoint like the CMOP one.
Have you tried testing a different NANOOS THREDDS endpoint? It may be that something has changed on the CMOP server, or is having hiccups. I suggest you test Craig's OSU ROMS endpoint instead.
thredds_crawler
may be useful in the future (though there's also now siphon to consider), but for now let's stick with nciso
.
Testing OSU ROMS end point also result in error:
$ java -Xms1024m -Xmx1024m -jar ncISO-2.3.jar -ts http://ona.coas.oregonstate.edu:8080/thredds/catalog.xml?dataset=OCOS -num 1 -depth 20 -iso true
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Error on line 1 column 50 of UnidataDDCount-HTML.xsl:
SXXP0003: Error reported by XML parser: White spaces are required between publicId and systemId.
Also when run against http://ingria.coas.oregonstate.edu/opendap/aggregated/catalog.xml I get an internal error:
$ java -Xms1024m -Xmx1024m -jar ncISO-2.3.jar -ts http://ingria.coas.oregonstate.edu/opendap/aggregated/catalog.xml -num 1 -depth 20 -iso true
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
opendap.dap.DAP2Exception: Method failed:HTTP/1.1 500 Internal Server Error on URL= http://ingria.coas.oregonstate.edu/opendap/hyrax/aggregated/ocean_time_aggregation.ncml.dods?ntimes,ndtfast,dt,dtfast,dstart,nHIS,ndefHIS,nRST,ntsAVG,nAVG,ndefAVG,Falpha,Fbeta,Fgamma,nl_tnu2,nl_visc2,Akt_bak,Akv_bak,Akk_bak,Akp_bak,rdrg,rdrg2,Zob,Zos,Znudg,M2nudg,M3nudg,Tnudg,FSobc_in,FSobc_out,M2obc_in,M2obc_out,Tobc_in,Tobc_out,M3obc_in,M3obc_out,rho0,gamma2,LtracerSrc,spherical,xl,el,Vtransform,Vstretching,theta_s,theta_b,Tcline,hc,s_rho,s_w,Cs_r,Cs_w,user,ocean_time
at opendap.dap.DConnect2.openConnection(DConnect2.java:271)
at opendap.dap.DConnect2.getData(DConnect2.java:826)
at opendap.dap.DConnect2.getData(DConnect2.java:1116)
at ucar.nc2.dods.DODSNetcdfFile.readDataDDSfromServer(DODSNetcdfFile.java:1485)
at ucar.nc2.dods.DODSNetcdfFile.readArrays(DODSNetcdfFile.java:1542)
at ucar.nc2.dods.DODSNetcdfFile.<init>(DODSNetcdfFile.java:370)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at ucar.nc2.dataset.NetcdfDataset.openDodsByReflection(NetcdfDataset.java:1031)
at ucar.nc2.dataset.NetcdfDataset.acquireDODS(NetcdfDataset.java:980)
at ucar.nc2.dataset.NetcdfDataset.openOrAcquireFile(NetcdfDataset.java:664)
at ucar.nc2.dataset.NetcdfDataset.openDataset(NetcdfDataset.java:421)
at ucar.nc2.dataset.NetcdfDataset.openDataset(NetcdfDataset.java:404)
at ucar.nc2.dataset.NetcdfDataset.openDataset(NetcdfDataset.java:389)
at ucar.nc2.dataset.NetcdfDataset.openDataset(NetcdfDataset.java:376)
at thredds.server.metadata.util.ThreddsExtentUtil.doGetExtent(ThreddsExtentUtil.java:61)
at thredds.server.metadata.util.ThreddsExtentUtil.getExtent(ThreddsExtentUtil.java:340)
at gov.noaa.eds.service.DatasetTreeService.getNodes(DatasetTreeService.java:167)
at gov.noaa.eds.service.DatasetTreeService.getNodes(DatasetTreeService.java:238)
at gov.noaa.eds.service.DatasetTreeService.getNodes(DatasetTreeService.java:238)
at gov.noaa.eds.service.DatasetTreeService.getNodes(DatasetTreeService.java:238)
at gov.noaa.eds.service.DatasetTreeService.generateTree(DatasetTreeService.java:323)
at gov.noaa.eds.controller.ServiceController.createTree(ServiceController.java:99)
at gov.noaa.eds.controller.ServiceController.main(ServiceController.java:181)
java.io.IOException: java.io.IOException: opendap.dap.DAP2Exception: Method failed:HTTP/1.1 500 Internal Server Error on URL= http://ingria.coas.oregonstate.edu/opendap/hyrax/aggregated/ocean_time_aggregation.ncml.dods?ntimes,ndtfast,dt,dtfast,dstart,nHIS,ndefHIS,nRST,ntsAVG,nAVG,ndefAVG,Falpha,Fbeta,Fgamma,nl_tnu2,nl_visc2,Akt_bak,Akv_bak,Akk_bak,Akp_bak,rdrg,rdrg2,Zob,Zos,Znudg,M2nudg,M3nudg,Tnudg,FSobc_in,FSobc_out,M2obc_in,M2obc_out,Tobc_in,Tobc_out,M3obc_in,M3obc_out,rho0,gamma2,LtracerSrc,spherical,xl,el,Vtransform,Vstretching,theta_s,theta_b,Tcline,hc,s_rho,s_w,Cs_r,Cs_w,user,ocean_time
at ucar.nc2.dataset.NetcdfDataset.openDodsByReflection(NetcdfDataset.java:1035)
at ucar.nc2.dataset.NetcdfDataset.acquireDODS(NetcdfDataset.java:980)
at ucar.nc2.dataset.NetcdfDataset.openOrAcquireFile(NetcdfDataset.java:664)
at ucar.nc2.dataset.NetcdfDataset.openDataset(NetcdfDataset.java:421)
at ucar.nc2.dataset.NetcdfDataset.openDataset(NetcdfDataset.java:404)
at ucar.nc2.dataset.NetcdfDataset.openDataset(NetcdfDataset.java:389)
at ucar.nc2.dataset.NetcdfDataset.openDataset(NetcdfDataset.java:376)
at thredds.server.metadata.util.ThreddsExtentUtil.doGetExtent(ThreddsExtentUtil.java:61)
at thredds.server.metadata.util.ThreddsExtentUtil.getExtent(ThreddsExtentUtil.java:340)
at gov.noaa.eds.service.DatasetTreeService.getNodes(DatasetTreeService.java:167)
at gov.noaa.eds.service.DatasetTreeService.getNodes(DatasetTreeService.java:238)
at gov.noaa.eds.service.DatasetTreeService.getNodes(DatasetTreeService.java:238)
at gov.noaa.eds.service.DatasetTreeService.getNodes(DatasetTreeService.java:238)
at gov.noaa.eds.service.DatasetTreeService.generateTree(DatasetTreeService.java:323)
at gov.noaa.eds.controller.ServiceController.createTree(ServiceController.java:99)
at gov.noaa.eds.controller.ServiceController.main(ServiceController.java:181)
Caused by: java.io.IOException: opendap.dap.DAP2Exception: Method failed:HTTP/1.1 500 Internal Server Error on URL= http://ingria.coas.oregonstate.edu/opendap/hyrax/aggregated/ocean_time_aggregation.ncml.dods?ntimes,ndtfast,dt,dtfast,dstart,nHIS,ndefHIS,nRST,ntsAVG,nAVG,ndefAVG,Falpha,Fbeta,Fgamma,nl_tnu2,nl_visc2,Akt_bak,Akv_bak,Akk_bak,Akp_bak,rdrg,rdrg2,Zob,Zos,Znudg,M2nudg,M3nudg,Tnudg,FSobc_in,FSobc_out,M2obc_in,M2obc_out,Tobc_in,Tobc_out,M3obc_in,M3obc_out,rho0,gamma2,LtracerSrc,spherical,xl,el,Vtransform,Vstretching,theta_s,theta_b,Tcline,hc,s_rho,s_w,Cs_r,Cs_w,user,ocean_time
at ucar.nc2.dods.DODSNetcdfFile.readArrays(DODSNetcdfFile.java:1547)
at ucar.nc2.dods.DODSNetcdfFile.<init>(DODSNetcdfFile.java:370)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at ucar.nc2.dataset.NetcdfDataset.openDodsByReflection(NetcdfDataset.java:1031)
... 15 more
Are the errors happening on nile
, or on the same desktop/laptop you used before? I forget if you had done any tests on nile in the past, or whether all your nciso tests were on your desktop/laptop.
I tested this in the same environment as before, my local linux computer. Ubuntu 14.04 with java version:
java version "1.7.0_111"
OpenJDK Runtime Environment (IcedTea 2.6.7) (7u111-2.6.7-0ubuntu0.14.04.3)
OpenJDK 64-Bit Server VM (build 24.111-b01, mixed mode)
Yikes. So, it is possible that the problem is related to the https
issue after all! nciso
may be accessing resources on a federal/noaa web site that's now been changed from http to https.
Sorry, I'll be offline for the next ~ 2.5 hours ...
I see at that ioos/registry issue that rsignell-usgs
actually made it very clear: "ah, okay, so that's why nobody's standalone ncISO is working anymore. Yes, I agree that moving the XSLTs from NOAA to Unidata github repo make sense."
Darn. I just added a comment/questions that github thread, to see what's being done. An alternative for us may be to issue requests to THREDDS ncISO plugin on each endpoint (instead of using the stand-alone ncISO). Let's see if we hear back on my question, by tomorrow, before taking any action. In the mean time we can maybe start working on sensorml2iso
instead. I'll follow up on that.
I'm not feeling very confident that a fixed ncISO jar will be available soon ... Probably best if you start testing the wget
-based extraction of iso xml's from our two THREDDS servers. More details here (in addition to info on thredds_crawler
):
And links found there and elsewhere.
For now I prefer using wget vs thredds_crawler; no dependencies, less complexity, etc.
THREDDS servers only. Hyrax doesn't include an nciso plugin, as far as we know.
wget
:OSU ROMS dataset:
CMOP SELFE dataset:
Zipped iso xml results: isoxml.zip
Using the latest release of ncISO jar these three NANOOS THREDDS endpoints were tested and succeeded:
catalog.xml
file that follows THREDDS conventions]The resulting ISOXML, ncML, and Score are attached below. waf.zip
Latest release of ncISO.jar?? Cool! I'm curious: where did you hear about this? I hadn't seen nciso under https://github.com/NOAA-PMEL/ before.
@emiliom I've just been following the pull request that was about to happen: https://github.com/NOAA-PMEL/uafnciso/pull/4#issuecomment-273214032
Ok, I see that it the NOAA-PMEL github site was mentioned earlier in the other github issue discussion ...
Might as well re-use this old issue on stand-alone ncISO and NANOOS. cc @mwengren, @crisien
I tried the nciso job with the updated OSU THREDDS urls. Still failing, but now that I looked at the run-time messages, I can see that it fails for all endpoints the same way. The problem is with nciso, and it looks like it involves a failure to find a valid certificate for an external XSLT stylesheet. Here are the run-time messages for one of the end points. Micah, hopefully that rings a bell to you, or you can help float the issue with the nciso developers:
Retrieving ISO XML for https://wilson.coas.oregonstate.edu/thredds/catalog.xml?dataset=OCOS
INFO [main] (ServiceController.java:29) - Running createWaf in service controller...
INFO [main] (CatalogCrawlerImpl.java:78) - maxDepth: 20 depth: 0 dataset.getFullName():OSU ROMS; dataset.hasAccess(): true maxLeaves: 1 leafcnt: 0
INFO [main] (CatalogCrawlerImpl.java:82) - allowable service
INFO [main] (CatalogCrawlerImpl.java:99) - adding mdc
INFO [main] (WafService.java:45) - ncmlFilePath=/var/www/metadata/ioos/testncISO/ncml/wilson.xml
ERROR: 'Could not compile stylesheet'
FATAL ERROR: 'sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target'
:sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
ERROR [main] (WafService.java:85) - thredds.server.metadata.exception.ThreddsUtilitiesException: Configuration problem: https://cdn.rawgit.com/NOAA-PMEL/uafnciso/fdb7f86515c21a8b5c087978975addf9ad5d0027/transforms/UnidataDDCount-HTML.xsl TransformerConfigurationException. sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
ERROR: 'Could not compile stylesheet'
FATAL ERROR: 'sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target'
:sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
ERROR [main] (WafService.java:99) - thredds.server.metadata.exception.ThreddsUtilitiesException: Configuration problem: https://cdn.rawgit.com/noaaroland/uafnciso/e84d6e26b87a799eb996173358c72ec7a4ed4912/transforms/UnidataDD2MI.xsl TransformerConfigurationException. sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
It seems odd that the two urls on this error message are ones that in this uafnciso commit leading up to the latest release (2.3.6, the one I'm using), those two url's were replaced with different ones:
Still, the nciso job had been working. I don't know why it's failing now, or how long it's been failing.
To harvest NANOOS THREDDS servers (OSU ona & CMOP) and host iso metadata records for NGDC to ingest. When ready, will require a change/update to the IOOS/NGDC registration for those two endpoints.
Start looking into this in the first week of January.