ioos / registry

Getting data services registered in the IOOS Service Registry
http://ioos.github.io/registry/
2 stars 7 forks source link

Using ncISO in stand-alone mode #94

Closed rsignell-usgs closed 6 years ago

rsignell-usgs commented 7 years ago

Following the instructions here: https://github.com/ioos/registry/wiki/Hosting-Your-Own-WAF

I tried harvesting ISO metadata from the SIO HFR catalogs using this script, where I used a custom (updated) XSL from here

#!/bin/bash
for full_cat in \
http://hfrnet.ucsd.edu/thredds/HFRADAR_USWC_hourly_RTV.xml \
http://hfrnet.ucsd.edu/thredds/HFRADAR_USEGC_hourly_RTV.xml \
http://hfrnet.ucsd.edu/thredds/HFRADAR_USHI_hourly_RTV.xml \
http://hfrnet.ucsd.edu/thredds/HFRADAR_AKNS_hourly_RTV.xml \
http://hfrnet.ucsd.edu/thredds/HFRADAR_PRVI_hourly_RTV.xml
do
  echo $full_cat
  java -Xms1024m -Xmx1024m -jar ncISO-2.3.jar \
    -custom true -xsl /usgs/data2/rsignell/waf/UnidataDD2MI.xsl \
    -ts ${full_cat} -num 100 -depth 20 -iso true \
    -waf /usgs/data2/rsignell/waf/data/sio_hfr
done

When I run this I get errors like:

rsignell@gam:/usgs/data2/rsignell/waf/data/sio_hfr/iso$ Error on line 1 column 50 of UnidataDDCount-HTML.xsl:
  SXXP0003: Error reported by XML parser: White spaces are required between publicId and systemId.
http://hfrnet.ucsd.edu/thredds/HFRADAR_USHI_hourly_RTV.xml
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Error on line 1 column 50 of UnidataDDCount-HTML.xsl:
  SXXP0003: Error reported by XML parser: White spaces are required between publicId and systemId.
http://hfrnet.ucsd.edu/thredds/HFRADAR_AKNS_hourly_RTV.xml
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Error on line 1 column 50 of UnidataDDCount-HTML.xsl:

@pacioos, any idea what is wrong?

pacioos commented 7 years ago

Sorry, no idea Rich. I don't use ncISO in stand-alone mode--I typically wget/curl an online ncISO end-point. Nor have I ever needed to modify UnidataDDCount-HTML.xsl. However, rather than harvesting the container catalogs (e.g., http://hfrnet.ucsd.edu/thredds/HFRADAR_USHI_hourly_RTV.xml) shouldn't you be harvesting the individual dataset catalogs (e.g., http://hfrnet.ucsd.edu/thredds/HFRADAR_USHI_hourly_RTV.xml?dataset=HFRNet/USHI/1km/hourly/RTV)? (Or maybe ncISO stand-alone is smart enough to crawl the container for datasets...?)

I notice that the ISO links on those HFR catalogs are also failing, which could indicate the source of your problems. Try going here and clicking on the ISO link:

http://hfrnet.ucsd.edu/thredds/HFRADAR_USHI_hourly_RTV.html?dataset=HFRNet/USHI/1km/hourly/RTV http://hfrnet.ucsd.edu/thredds/iso/HFRNet/USHI/1km/hourly/RTV?catalog=http%3A%2F%2Fhfrnet.ucsd.edu%2Fthredds%2FHFRADAR_USHI_hourly_RTV.html&dataset=HFRNet%2FUSHI%2F1km%2Fhourly%2FRTV

It eventually dies with an HTTP 500 Internal Server Error. You might need to contact HFRNet/Scripps to troubleshoot their TDS. Perhaps ncISO is throwing a useful error message in their logs. Giving a quick glance at their catalog page and global attributes, they may be lacking some required metadata fields (e.g., id?).

rsignell-usgs commented 7 years ago

@pacioos , stand-alone ncISO is indeed smart enough to crawl the catalog. And it doesn't access the ISO service, just the OPeNDAP service. In fact, that's exactly when it's most useful -- in cases where the ncISO service hasn't been enabled, or is giving unsatisfactory results.

@geoneubie or @noaaroland, any ideas what is wrong with my stand-alone nciso usage?

geoneubie commented 7 years ago

Might be https switch over. I think there are are some hard coded assumptions about the use of http. Is this something Rowland/Brian can look at?

On Tue, Dec 20, 2016 at 3:55 AM, Rich Signell notifications@github.com wrote:

@pacioos https://github.com/pacioos , stand-alone ncISO is indeed smart enough to crawl the catalog. And it doesn't access the ISO service, just the OPeNDAP service. In fact, that's exactly when it's most useful -- in cases where the ncISO service hasn't been enabled, or is giving unsatisfactory results.

@geoneubie https://github.com/geoneubie or @noaaroland https://github.com/noaaroland, any ideas what is wrong with my stand-alone nciso usage?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ioos/registry/issues/94#issuecomment-268214245, or mute the thread https://github.com/notifications/unsubscribe-auth/ADIU3c9j5dz9ZYlWl_mJmetY57AC9TD-ks5rJ7QygaJpZM4LRNj8 .

noaaroland commented 7 years ago

The first problem is that the OPeNDAP data sets in http://hfrnet.ucsd.edu/thredds/HFRADAR_USHI_hourly_RTV.xml never return any data.

When I try to access the OPeNDAP data set via the browser or ncISO the TDS hangs.

Until the server actually returns something to the client we won't be able to figure out what's wrong, if anything.

pacioos commented 7 years ago

I noticed this today, too, (unrelated to ncISO) and e-mailed HFRNet. Their TDS lags for minutes before returning a result. Tom Cook is going to investigate as time allows but suggested using the NDBC TDS in the meantime:

http://sdf.ndbc.noaa.gov/thredds/catalog.html

Tom mentioned the HFRNet TDS is no longer funded. The NDBC TDS is working, but the downside is a shorter archive (I don't think they have all the data that HFRNet has) and no ncWMS (I've sent an inquiry today about this). Cheers, John

On Tue, Dec 20, 2016 at 1:05 PM, Roland Schweitzer <notifications@github.com

wrote:

The first problem is that the OPeNDAP data sets in http://hfrnet.ucsd.edu/ thredds/HFRADAR_USHI_hourly_RTV.xml never return any data.

When I try to access the OPeNDAP data set via the browser or ncISO the TDS hangs.

Until the server actually returns something to the client we won't be able to figure out what's wrong, if anything.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ioos/registry/issues/94#issuecomment-268385527, or mute the thread https://github.com/notifications/unsubscribe-auth/AEe7jj0kYKfUpc_RWDzXAOwnfpLieB5Fks5rKF9UgaJpZM4LRNj8 .

noaaroland commented 7 years ago

The problems with their TDS may be unrelated to ncISO, but if I take the catalog you mention and run the command Rich was trying to run:

java -Xms1024m -Xmx1024m -jar ncISO-2.3.jar -custom true -xsl /home/rhs/IdeaProjects/uafnciso/transforms/UnidataDD2MI-Rich.xsl -ts http://sdf.ndbc.noaa.gov/thredds/catalog.xml -num 100 -depth 20 -iso true -waf /tmp/waf/hfradar

It runs without error (except for the complaint about the missing log jar)

So in the end, I don't think there's anything wrong with how Rich is running ncISO.

pacioos commented 7 years ago

Btw, Tom Cook has fixed the issues with the HFRNet TDS today, so those catalogs should be working now as well.

rsignell-usgs commented 7 years ago

Could it be that the AggregationCache on the HFRNet TDS is scouring the aggregation metadata? I have my threddsConfig.xml specify to never scour:

<AggregationCache>
    <scour>-1 hours</scour>
</AggregationCache>

https://github.com/rsignell-usgs/xml/blob/master/THREDDS/geoport-dev/threddsConfig.xml#L159-L162

rsignell-usgs commented 7 years ago

@geoneubie , @noaaroland , even with the HFRnet TDS working fine, I still get this problem, even when I test on datasets I know worked before:

#!/bin/bash
for full_cat in http://geoport.whoi.edu/thredds/COAWST_catalog.xml
do
  echo $full_cat
  java -Xms1024m -Xmx1024m -jar /usgs/data2/rsignell/waf/ncISO-2.3.jar \
    -ts ${full_cat} -num 100000 -depth 20 -iso true \
    -waf /usgs/data2/rsignell/waf/data/coawst
done

Can someone try the above and tell me if it works for them?

I get:

http://geoport.whoi.edu/thredds/COAWST_catalog.xml
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Error on line 1 column 50 of UnidataDDCount-HTML.xsl:
  SXXP0003: Error reported by XML parser: White spaces are required between publicId and systemId.
Error on line 1 column 50 of UnidataDD2MI.xsl:
  SXXP0003: Error reported by XML parser: White spaces are required between publicId and systemId.
Error on line 1 column 50 of UnidataDD2MI.xsl:
  SXXP0003: Error reported by XML parser: White spaces are required between publicId and systemId.
Error on line 1 column 50 of UnidataDD2MI.xsl:
  SXXP0003: Error reported by XML parser: White spaces are required between publicId and systemId.
ebridger commented 7 years ago

@rsignell-usgs re: https://github.com/ioos/notebooks_demos/pull/130#issuecomment-269065304 Our ncISO.log indicates that the errors started around Dec. 1. So I ran some nciso tests on our WW3 and NECOFS thredds catalogs and found the same consistent results / failures, as above.

#!/bin/bash
# NECOFS Forecast. Tomcat 7, TDS 4.3.18
TDS='http://www.smast.umassd.edu:8080/thredds/forecasts.xml'

# WW3 Forecast. Tomcat 7, TDS 4.3.20
#TDS='http://www.neracoos.org/thredds/catalog/WW3/catalog.xml'

# WW3 Forecast catalog - experiemntal Docker, Tomcat 8, TDS 4.6.6
#TDS='http://52.203.37.112:8260/thredds/catalog/WW3/catalog.xml'

echo $TDS
# java 6 is still the default on this old server
/usr/lib/jvm/default-java7/jre/bin/java -Xms1024m -Xmx1024m \
 -jar ncISO-2.3.jar \
 -num 99 -depth 20 -iso true -waf testWAF \
 -ts $TDS

All catalogs show the same results. zero length iso xml records are created. Errors are the same as above. The ncISO.log shows a bit more detail.

ERROR [main] util.ThreddsTranslatorUtil.[] Dec/28 14:04:14 - Configuration problem: http://www.ngdc.noaa.gov/metadata/published/xsl/nciso2.0/UnidataDDCount-HTML.xsl TransformerConfigurationException. Failed to compile stylesheet. 1 error detected.
javax.xml.transform.TransformerConfigurationException: Failed to compile stylesheet. 1 error detected.
rsignell-usgs commented 7 years ago

@noaaroland , are you getting non-zero length ISO files?

dneufeldcu commented 7 years ago

The XSLTs are no longer served over http they need to be accessed via https. This might be a good time to switch over to XSLTs hosted on the Unidata github site as well.

rsignell-usgs commented 7 years ago

@dneufeldcu, ah, okay, so that's why nobody's standalone ncISO is working anymore. Yes, I agree that noving the XSLTs from NOAA to Unidata github repo make sense.

@noaaroland, Is your plan to work with @lesserwhirls to submit a PR to enable the ncISO jar file supplied with the TDS to work in standalone mode?

emiliom commented 7 years ago

Darn. So what short-term, feasible options do users of the stand-alone ncISO have (short of NOT using stand-alone ncISO ...)?? Are there near-term plans to put out a new version that just updates the url's to https?

While it's true that "This might be a good time to switch over to XSLTs hosted on the Unidata github site as well", having a broken stand-alone ncISO is bad all around, specially given that this happened just as IOOS released its new Catalog that relies largely on WAFs, where the official (or grapevine-official?) recommendation was to use the stand-alone ncISO ... A quick fix would be great, even if it's just for the interim.

rsignell-usgs commented 7 years ago

I agree we need a fix for this ASAP. @noaaroland , you have the code right?
Can you just change the http to https on the NOAA URLs and rebuild the jar or something to get this working again?

geoneubie commented 7 years ago

https://github.com/noaaroland/uafnciso/blob/master/src/main/java/gov/noaa/eds/service/WafService.java#L23-L24

On Wed, Jan 11, 2017 at 6:16 AM, Rich Signell notifications@github.com wrote:

I agree we need a fix for this ASAP. @noaaroland https://github.com/noaaroland , you have the code right? Can you just change the http to https on the NOAA URLs and rebuild the jar or something to get this working again?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ioos/registry/issues/94#issuecomment-271843141, or mute the thread https://github.com/notifications/unsubscribe-auth/ADIU3Tfl1DMlizbECspnVvpTlJtasYNyks5rRLn6gaJpZM4LRNj8 .

rsignell-usgs commented 7 years ago

@geoneubie , so like this? https://github.com/noaaroland/uafnciso/pull/1 https://github.com/noaaroland/uafnciso/pull/1/files

geoneubie commented 7 years ago

yes

mwengren commented 7 years ago

Where should the resulting compiled .jar file be hosted? Is the goal to publish a standalone nciso.jar to Unidata somewhere such as here: http://www.unidata.ucar.edu/software/thredds/v4.3/tds/tds4.2/reference/ncISO.html? How are IOOS data providers obtaining the standalone nciso.jar currently? Via THREDDS download?

I can help resolve the issue in the short term, but I've never worked with nciso directly. And a clear hosting option would help (or someone to enlighten me about it).

rsignell-usgs commented 7 years ago

I tried to make these changes and build the jar file, but ran into problem with dependencies, at least as set here: https://github.com/noaaroland/uafnciso/issues/3

mwengren commented 7 years ago

@emiliom You could also look into using the thredds_crawler rather than standalone nciso scripts. Some documentation and NERACOOS' example is here. This would depend on the nciso service in THREDDS running of course, and might sacrifice some control over the resulting ISO files (like not using a custom xslt).

@rsignell-usgs Thanks for the wiki link on WAF generation... first I've seen it in fact :). FWIW, this repo should probably be retired, the plan for Catalog related general issues/documentation is to move everything to the new Catalog repo: https://github.com/ioos/catalog (with corresponding docs

rsignell-usgs commented 7 years ago

@mwengren , as @emiliom mentioned, the thredds-crawler approach works great when:

But when this is not the case, the stand-alone ncISO fills the gap, as it just accesses opendap endpoints -- it doesn't need ncISO services enabled on the THREDDS server.

mwengren commented 7 years ago

True, agreed on that. At least the THREDDSIso service uses an embedded UnidataDD2MI.xsl so it shouldn't have been broken during the https://https.cio.gov/ implementation. But that assumes it's in fact available. Not sure whether that is the case for NANOOS or not.

I also agree the standalone ncISO needs to be patched ASAP, just presenting some alternatives in case it helps some IOOS data providers in preventing their Catalog datasets go dark in the meantime.

cc @lukecampbell FYI

emiliom commented 7 years ago

FYI, we downloaded the stand-alone ncISO from https://www.ngdc.noaa.gov/eds/tds/downloads/ncISO-2.3.jar, which we found from this very old (2012?) unidata page.

In NANOOS we have 2 THREDDS and 1 hyrax servers. The THREDDS servers are fairly recent and do have nciso; I don't believe hyrax has an ncISO plugin, but I'm not sure. Still, we were trying to operationalize the use of the stand-alone ncISO for consistent behavior and other reasons already described.

We'll consider thredds_crawler if needed, though we haven't examined it closely. But I've also wondered if unidata's siphon is a better long-term investment, compared to thredds_crawler. That may be getting off topic, though.

emiliom commented 7 years ago

FWIW, I composed the comment below last week, then either forgot to hit Comment or to confirm that it went in. Sigh.

Then, it looks like an updated ncISO.jar was released here 9 days ago. We've tested it today, and it works!

Any updates on when the fixed stand-alone ncISO jar may be available? FYI, we also tried compiling the jar, but ran into the same errors reported by @rsignell-usgs elsewhere.

@mwengren, I hadn't seen the documentation at https://ioos.github.io/catalog/. It's looking great! Regarding THREDDS ISO harvesting, it doesn't present stand-alone ncISO as one of the options. Hmm. As I mentioned, using the stand-alone ncISO had been the IOOS party line AFAIK. In addition to the circumstances @rsignell-usgs mentioned already, it can bring more consistency to the metadata when there are multiple THREDDS servers to harvest from. Plus there's our Hyrax server.

mwengren commented 7 years ago

@emiliom Glad to hear this release resolved the issue. I'll make a note to update our Catalog Documentation with a link to this particular release, with some info on using standalone ncISO as an alternative to thredds_crawler.

I wasn't aware of a preferred approach to WAF generation among IOOS data providers (still the new person around here a bit), but Hyrax certainly presents a good use case. The main reason we hadn't included any examples or mention of using standalone ncISO is that I didn't have any examples from the community to refer to. @ebridger was the only example I received for RA WAF generation script in use, so that's the reason we feature his scripts on our documentation page. Any other example scripts are most welcome! Please submit a PR with your scripts if you want to share to this folder in the Catalog repo.

Truth be told, I had no idea I was mostly recreating these Wiki pages when I started the github.io Catalog docs, until this thread got started. Somehow missed the fact they existed.

We would like to retire this repo though, since the 'Registry' concept has been replaced by the Harvest Registry and most of this info is now out of date.

rsignell-usgs commented 7 years ago

@mwengren , where would you like to discuss these type of issues?

mwengren commented 7 years ago

@rsignell-usgs you got it, https://github.com/ioos/catalog is the replacement for this repo.

rsignell-usgs commented 6 years ago

The new stand-alone ncISO is at: https://github.com/NOAA-PMEL/uafnciso