ioos / registry

Getting data services registered in the IOOS Service Registry
http://ioos.github.io/registry/
2 stars 7 forks source link

Investigate why gliders are not being harvested #14

Open rsignell-usgs opened 10 years ago

rsignell-usgs commented 10 years ago

Why aren't these gliders being harvested? Or are they harvested elsewhere? http://tds.gliders.ioos.us/thredds/catalog.html

amilan17 commented 10 years ago

Rich, The service contains all CatalogRefs - no datasets - therefore EMMA is unable to harvest.

Anna ~~~~~~~ Anna.Milan@noaa.gov, 303-497-5099 NOAA/NESDIS/NGDC

One cannot have too much metadata. Start simple and grow from there. ~~~~~~~

On Tue, Apr 8, 2014 at 8:03 AM, Rich Signell notifications@github.comwrote:

Why aren't these gliders being harvested? Or are they harvested elsewhere? http://tds.gliders.ioos.us/thredds/catalog.html

Reply to this email directly or view it on GitHubhttps://github.com/ioos/registry/issues/14 .

kwilcox commented 10 years ago

@amilan17 Are there any plans to support following catalogRefs?

Lots and lots of THREDDS catlaogs make use of catalogRefs to help organize the catalog XML files on the backend. If catalogRefs are not used, the entire catalog needs to be in a single XML file.

Even UNIDATA's THREDDS catalog makes use of catalogRefs at the root level: http://thredds.ucar.edu/thredds/catalog.html

amilan17 commented 10 years ago

In short - no - we will not support following catalogRefs. I believe that ncISO does have the capability to do so, but we have chosen to NOT implement this feature, because it can get unwieldy quite quickly. I've cc'd David Neufeld - who might some further insight to this decision.

Sincerely,

Anna ~~~~~~~ Anna.Milan@noaa.gov, 303-497-5099 NOAA/NESDIS/NGDC

One cannot have too much metadata. Start simple and grow from there. ~~~~~~~

On Tue, Apr 8, 2014 at 11:34 AM, Kyle Wilcox notifications@github.comwrote:

@amilan17 https://github.com/amilan17 Are there any plans to support following catalogRefs?

Lots and lots of THREDDS catlaogs make use of catalogRefs to help organize the catalog XML files on the backend. If catalogRefs are not used, the entire catalog needs to be in a single XML file.

Even UNIDATA's THREDDS catalog makes use of catalogRefs at the root level: http://thredds.ucar.edu/thredds/catalog.html

Reply to this email directly or view it on GitHubhttps://github.com/ioos/registry/issues/14#issuecomment-39877884 .

robragsdale commented 10 years ago

I registered each service endpoint https://www.ngdc.noaa.gov/docucomp/collectionSource/list?recordSetId=6338757&componentId=&serviceType=&serviceStatus=&serviceUrl=&search=List+Collection+Sources in the THREDDS because of the catalogref problem that are causing a socket error. @amilan17 noted that the services are failing because of invalid ISO metadata. How can we help John to improve his metadata so these records are valid?

amilan17 commented 10 years ago

Hi All,

This is the most common ISO validation error in the Glider Metadata records: cvc-datatype-valid.1.2.1: '2013-09-10T20:10TUTC' is not a valid value for 'dateTime'.

The source NCML dates look like:

If the source NCML dates are formatted like any of these below - then that validation error will be fixed. 2013-09-22T19:10:00Z 2013-09-22 19:10:00Z 2013-09-22T19:10:00

http://en.wikipedia.org/wiki/ISO_8601#UTC

Anna ~~~~~~~ Anna.Milan@noaa.gov, 303-497-5099 NOAA/NESDIS/NGDC

One cannot have too much metadata. Start simple and grow from there. ~~~~~~~

On Tue, Apr 8, 2014 at 12:20 PM, robragsdale notifications@github.comwrote:

I registered each service endpoint https://www.ngdc.noaa.gov/docucomp/collectionSource/list?recordSetId=6338757&componentId=&serviceType=&serviceStatus=&serviceUrl=&search=List+Collection+Sourcesin the THREDDS because of the catalogref problem that are causing a socket error. @amilan17 https://github.com/amilan17 noted that the services are failing because of invalid ISO metadata. How can we help John to improve his metadata so these records are valid?

Reply to this email directly or view it on GitHubhttps://github.com/ioos/registry/issues/14#issuecomment-39883190 .

amilan17 commented 10 years ago

There are also some NaN values that should be a decimal value instead.

e.g. http://tds.gliders.ioos.us/thredds/ncml/Dalhousie-University_otn200-20130910T1551_Files/otn200-20130910T155138_rt0.nc?catalog=http%3A%2F%2Ftds.gliders.ioos.us%2Fthredds%2Fcatalog%2FDalhousie-University_otn200-20130910T1551_Files%2Fcatalog.html&dataset=Dalhousie-University_otn200-20130910T1551_Files%2Fotn200-20130910T155138_rt0.nc

Anna ~~~~~~~ Anna.Milan@noaa.gov, 303-497-5099 NOAA/NESDIS/NGDC

One cannot have too much metadata. Start simple and grow from there. ~~~~~~~

On Tue, Apr 8, 2014 at 4:15 PM, Anna Milan - NOAA Federal < anna.milan@noaa.gov> wrote:

Hi All,

This is the most common ISO validation error in the Glider Metadata records: cvc-datatype-valid.1.2.1: '2013-09-10T20:10TUTC' is not a valid value for 'dateTime'.

The source NCML dates look like:

If the source NCML dates are formatted like any of these below - then that validation error will be fixed. 2013-09-22T19:10:00Z 2013-09-22 19:10:00Z 2013-09-22T19:10:00

http://en.wikipedia.org/wiki/ISO_8601#UTC

Anna ~~~~~~~ Anna.Milan@noaa.gov, 303-497-5099 NOAA/NESDIS/NGDC

One cannot have too much metadata. Start simple and grow from there. ~~~~~~~

On Tue, Apr 8, 2014 at 12:20 PM, robragsdale notifications@github.comwrote:

I registered each service endpoint https://www.ngdc.noaa.gov/docucomp/collectionSource/list?recordSetId=6338757&componentId=&serviceType=&serviceStatus=&serviceUrl=&search=List+Collection+Sourcesin the THREDDS because of the catalogref problem that are causing a socket error. @amilan17 https://github.com/amilan17 noted that the services are failing because of invalid ISO metadata. How can we help John to improve his metadata so these records are valid?

Reply to this email directly or view it on GitHubhttps://github.com/ioos/registry/issues/14#issuecomment-39883190 .

daf commented 10 years ago

This likely falls on me as portions of this catalog are autogenerated. Will investigate and see what I can do from my end.

amilan17 commented 10 years ago

Hi Dave,

I tested one services registered ( http://tds.gliders.ioos.us/thredds/rutgers/otn200-20130910T1551/catalog.xml) and the ncISO crawler returned the collection level records AND all of the granules.

See: http://www.ngdc.noaa.gov/metadata/published/test/NOAA/IOOS/Glider_DAC/iso_u/

I think ncISO crawler is interpreting the following element in the CatalogRef section to pick up and harvest everything.

Can you help me understand why that is there? Does it need to stay?

Anna ~~~~~~~ Anna.Milan@noaa.gov, 303-497-5099 NOAA/NESDIS/NGDC

One cannot have too much metadata. Start simple and grow from there. ~~~~~~~

On Wed, Apr 9, 2014 at 8:05 AM, Dave Foster notifications@github.comwrote:

This likely falls on me as portions of this catalog are autogenerated. Will investigate and see what I can do from my end.

Reply to this email directly or view it on GitHubhttps://github.com/ioos/registry/issues/14#issuecomment-39967359 .

daf commented 10 years ago

@amilan17 I can't find that <property name="DatasetScan" value="true" /> you mention, where should I be looking for it?

Also this record list suggests that individual files are registered from the Glider DAC, when it should really just be the two aggregations (right?)

daf commented 10 years ago

Nevermind, I found it, will continue to look (looking at http://tds.gliders.ioos.us/thredds/rutgers/ru29-20131110T1400/catalog.xml now)

daf commented 10 years ago

Ok so the source rutgers/ru29-20131110T1400/catalog.xml turns into this catalog.xml when served by TDS.

I thought that EMMA wasn't following catalogRefs, but maybe this isn't EMMA at this stage.

To answer your question, it makes logical sense that is is there, but if we have to split them, we can. @rsignell-usgs can you weigh in? Thanks.

robragsdale commented 10 years ago

Do the glider DAC aggregated files need to be split so they will be harvested? Is that the next step?
Thanks

amilan17 commented 10 years ago

I think there are two options: Split out the aggregated files OR remove the 'DatasetScan' element from the current catalog.

Anna ~~~~~~~ Anna.Milan@noaa.gov, 303-497-5099 NOAA/NESDIS/NGDC

One cannot have too much metadata. Start simple and grow from there. ~~~~~~~

On Fri, Apr 18, 2014 at 11:34 AM, robragsdale notifications@github.comwrote:

Do the glider DAC aggregated files need to be split so they will be harvested? Is that the next step?

Thanks

Reply to this email directly or view it on GitHubhttps://github.com/ioos/registry/issues/14#issuecomment-40827438 .

robragsdale commented 10 years ago

@daf Currently, this is the only glider data set being harvested successfully - http://www.ngdc.noaa.gov/metadata/published/test/NOAA/IOOS/Glider_DAC/iso_u/. The other glider data sets are failing translation - https://www.ngdc.noaa.gov/docucomp/collectionSource/list?recordSetId=6338757&componentId=&serviceType=&serviceStatus=&serviceUrl=&search=List+Collection+Sources. We were looking into this earlier. @amilan17 suggested two options :Split out the aggregated files OR remove the 'DatasetScan' element from the current catalog. However, why is one dataset being harvested, but others are not?

amilan17 commented 10 years ago

Hi Rob - I kept on as approved to see if/when it does change. no other reason

On Tuesday, June 3, 2014, robragsdale notifications@github.com wrote:

@daf https://github.com/daf Currently, this is the only glider data set being harvested successfully - http://www.ngdc.noaa.gov/metadata/published/test/NOAA/IOOS/Glider_DAC/iso_u/. The other glider data sets are failing translation - https://www.ngdc.noaa.gov/docucomp/collectionSource/list?recordSetId=6338757&componentId=&serviceType=&serviceStatus=&serviceUrl=&search=List+Collection+Sources. We were looking into this earlier. @amilan17 https://github.com/amilan17 suggested two options :Split out the aggregated files OR remove the 'DatasetScan' element from the current catalog. However, why is one dataset being harvested, but others are not?

— Reply to this email directly or view it on GitHub https://github.com/ioos/registry/issues/14#issuecomment-45002996.

Anna ~~~~~~~ Anna.Milan@noaa.gov, 303-497-5099 NOAA/NESDIS/NGDC

http://www.ngdc.noaa.gov/metadata/emma ~~~~~~~

daf commented 10 years ago

Hi @amilan17, @robragsdale: i've split the individual files from the aggregates (see http://tds.gliders.ioos.us/thredds/catalog.html)

The aggregates remain at the same URLs. If someone can flip whatever it takes on EMMA to look at them again, please do!

@rsignell-usgs how's the catalog look, have I done it correctly?

rsignell-usgs commented 10 years ago

@daf, looks great! So you have submitted (or are submitting) all the aggregation catalogs:

http://tds.gliders.ioos.us/thredds/rmendels/sp031-20140405T1440/catalog.xml
http://tds.gliders.ioos.us/thredds/asa/unit_236-20121005T0023/catalog.xml
...
http://tds.gliders.ioos.us/thredds/usf/usfbass-20140303T1600/catalog.html

right?

daf commented 10 years ago

Many are already submitted but I'll have a look through now to catch the newer ones.

amilan17 commented 10 years ago

Changed status to submitted for all already in the Collection Source table.

Should we remove these URLs?

http://tds.gliders.ioos.us/thredds/scratch.xml http://tds.gliders.ioos.us/thredds/catalog.xml

Anna ~~~~~~~ Anna.Milan@noaa.gov, 303-497-5099 NOAA/NESDIS/NGDC

http://www.ngdc.noaa.gov/metadata/emma ~~~~~~~

On Fri, Jun 20, 2014 at 11:41 AM, Dave Foster notifications@github.com wrote:

Hi @amilan17 https://github.com/amilan17, @robragsdale https://github.com/robragsdale: i've split the individual files from the aggregates (see http://tds.gliders.ioos.us/thredds/catalog.html)

The aggregates remain at the same URLs. If someone can flip whatever it takes on EMMA to look at them again, please do!

@rsignell-usgs https://github.com/rsignell-usgs how's the catalog look, have I done it correctly?

— Reply to this email directly or view it on GitHub https://github.com/ioos/registry/issues/14#issuecomment-46705933.

daf commented 10 years ago

Yes, please remove both of those URLs. I have an additional list of URLs to submit, I'll do those the Official Way (tm).

As more gliders come on, this suggests to me that I should be automating a WAF creation and just having EMMA point at that to pick up new changes rather than me submitting all the time.

rsignell-usgs commented 10 years ago

@daf, it looks like all the depth-averaged files are failing. I tried harvesting and I get failures like below. I don't know what they mean, but perhaps @amilan17 does? 6-20-2014 3-11-16 pm

amilan17 commented 10 years ago

For the date validation errors: the dates in the netCDF need to be ISO 8601 compliant. So this:

should be:

Anna ~~~~~~~ Anna.Milan@noaa.gov, 303-497-5099 NOAA/NESDIS/NGDC

http://www.ngdc.noaa.gov/metadata/emma ~~~~~~~

On Fri, Jun 20, 2014 at 1:12 PM, Rich Signell notifications@github.com wrote:

@daf https://github.com/daf, it looks like all the depth-averaged files are failing. I tried harvesting and I get failures like below. I don't know what they mean, but perhaps @amilan17 https://github.com/amilan17 does? [image: 6-20-2014 3-11-16 pm] https://cloud.githubusercontent.com/assets/1872600/3344742/d545f53a-f8ae-11e3-905e-9bd22243c02b.png

— Reply to this email directly or view it on GitHub https://github.com/ioos/registry/issues/14#issuecomment-46715856.

daf commented 10 years ago

The newer ones I've just registered in #31 don't seem to exhibit these formats, they have the correct ones. I'll look closer.

@rsignell-usgs 's issues appear to be a unit concern, looking at now.

robragsdale commented 10 years ago

The "old" glider service endpoints Anna updated on Friday were harvested into the test WAF http://www.ngdc.noaa.gov/metadata/published/test/NOAA/IOOS/Glider_DAC/iso_u/. I've changed their status to approved in the registry. The newer service endpoints in #31 did not get harvested.

rsignell-usgs commented 10 years ago

There seems to be some other problem with time. @amilan17? 6-24-2014 11-33-32 am

amilan17 commented 10 years ago

Ah. Yes. The uomIdentifier problem rears it's ugly head again!

This attribute in NCML:

is translated to this attribute in ISO:

This would be valid in ISO, but does it still make sense? If so, I _think_ I can update the XSL to handle these units more elegantly.
rsignell-usgs commented 10 years ago

I don't understand the issue. It seems like proper encoding of CF time units, the same as all other CF datasets which get harvested without problems. What is different here?

amilan17 commented 10 years ago

I'm not sure other datasets with the same type of unit representation are harvested without problems. I've seen this before: http://www.ngdc.noaa.gov/docucomp/page?xml=NOAA/IOOS/SCCOOS/iso/reports/IsoValidationReport.xml&view=isoValidationErrorsReport&custom=default&title=NOAA/IOOS/SCCOOS%20Invalid%20Records

Anna ~~~~~~~ Anna.Milan@noaa.gov, 303-497-5099 NOAA/NESDIS/NGDC

http://www.ngdc.noaa.gov/metadata/emma ~~~~~~~

On Tue, Jun 24, 2014 at 11:37 AM, Rich Signell notifications@github.com wrote:

I don't understand the issue. It seems like proper encoding of CF time units, the same as all other CF datasets which get harvested without problems. What is different here?

— Reply to this email directly or view it on GitHub https://github.com/ioos/registry/issues/14#issuecomment-47003433.

amilan17 commented 10 years ago

@daf @rsignell-usgs The units error is fixed (see #37), but many records are still invalid due to the following errors:

amilan17 commented 10 years ago

@daf @rsignell-usgs I updated the NcML to ISO translation to convert UTC dates to ISO 8601 dates. By tomorrow we should not see any of date errors for the Glider_DAC records (fingers crossed!)

http://www.ngdc.noaa.gov/docucomp/page?xml=NOAA/IOOS/Glider_DAC/iso/reports/IsoValidationReport.xml&view=isoValidationErrorsReport&custom=default

dpsnowden commented 10 years ago

I'm glad the xsl was made a little more tolerant but I also want to continue to encourage data providers to do the right thing. The DAC is evolving a bit and they are working on modifications to the input files and the data services. So now is the time to get them to address any changes to the format. @kerfoot can you check the latest glider template and see if it conforms to the desired spec for time ? @amilan17 can you log an issue at the glider DAC repo? Github.com/kerfoot/ioosngdac.

Thanks!

kerfoot commented 10 years ago

Yes, the new spec (IOOS_Glider_NetCDF_v2.0.nc) uses ISO-8601 data/times (YYYYmmddTHH:MM:SSZ) for all date/time attributes and variables. Doco requiring the use of ISO-8601 is in the wiki.

robragsdale commented 10 years ago

The question starting this issue was

Why aren't these gliders being harvested? Or are they harvested elsewhere? http://tds.gliders.ioos.us/thredds/catalog.html

Glider metadata is making it into the catalog (http://catalog.ioos.us/services/filter/Glider_DAC/null), but records are still invalidating in Geoportal (http://www.ngdc.noaa.gov/docucomp/page?xml=NOAA/IOOS/Glider_DAC/iso/reports/IsoValidationReport.xml&view=isoValidationErrorsReport&custom=default&title=NOAA/IOOS/Glider_DAC%20Invalid%20Records).

A related issue was resolved - Fix uomIdentifier error in NcML to ISO transform #37

@rsignell-usgs @dpsnowden @amilan17 @kerfoot Glider metadata is making it to the catalog and the NGDC Geoportal. Can we close this issue? New issues can be opened, of course and logged at the Github.com/kerfoot/ioosngdac.

amilan17 commented 10 years ago

There are still validation issues caused by the following:

AND

there are some registered services that are not returning ncml: https://ngdc.noaa.gov/docucomp/collectionSource/list?recordSetId=6338757&componentId=&serviceType=&serviceStatus=UNRESPONSIVE&serviceUrl=&search=List+Collection+Sources

robragsdale commented 10 years ago

@kerfoot @lukecampbell there are still problems with the date/time attribute in the NcML in some of the glider DAC files. Given that the XSLT has been updated to handle the translation and the latest glider DAC format is following ISO 8601 where could we be running into problems?
http://www.ngdc.noaa.gov/docucomp/page?xml=NOAA/IOOS/Glider_DAC/iso/reports/IsoValidationReport.xml&view=isoValidationErrorsReport&custom=default&title=NOAA/IOOS/Glider_DAC%20Invalid%20Records

@lukecampbell for these services that are not returning an NcML file (thus, not harvesting) is this a problem that we need to go back to the content providers (i.e. gliders are attributed to SECOORA (USF) and GCOOS)?
https://ngdc.noaa.gov/docucomp/collectionSource/list?recordSetId=6338757&componentId=&serviceType=&serviceStatus=UNRESPONSIVE&serviceUrl=&search=List+Collection+Sources

An as an aside @daf is the current poc for these services in the registry. Will @lukecampbell be the new poc or someone else?

robragsdale commented 10 years ago

@lukecampbell these services have been returning an internal service error from the EMMA harvesting process. @amilan17 what is the internal server error that you noted that @lukecampbell need to update for these to be harvested? This might be related to the ioos/catalog connect catalog to GliderDAC 2.0 (dev version) #138 harvesting issue?

These are the services:
http://tds.gliders.ioos.us/thredds/usf/usfbass-20140303T1600/catalog.xml http://tds.gliders.ioos.us/thredds/mkhoward/dora-20140625T0000/catalog.xml http://tds.gliders.ioos.us/thredds/mkhoward/unit_308_201406191000/catalog.xml http://tds.gliders.ioos.us/thredds/mkhoward/reville-20140619T0000/catalog.xml http://tds.gliders.ioos.us/thredds/mkhoward/unit_202_201406181000/catalog.xml

lukecampbell commented 10 years ago

The glider dac v2 harvesting doesn't touch this endpoint at all, so I don't think that catalog would be affecting this.

amilan17 commented 10 years ago

This is what I get when I traverse down to the ncML ( http://tds.gliders.ioos.us/thredds/ncml/University-of-South-Florida_usfbass-20140303T1600_Time.ncml?catalog=http%3A%2F%2Ftds.gliders.ioos.us%2Fthredds%2Fusf%2Fusfbass-20140303T1600%2Fcatalog.html&dataset=University-of-South-Florida_usfbass-20140303T1600_Time )

HTTP Status 500 - Internal Server Error

Status 500 - Internal Server Error THREDDS Data Server Version 4.3

Anna ~~~~~~~ Anna.Milan@noaa.gov, 303-497-5099 NOAA/NESDIS/NGDC

http://www.ngdc.noaa.gov/metadata/emma ~~~~~~~

On Tue, Sep 30, 2014 at 9:38 AM, robragsdale notifications@github.com wrote:

@lukecampbell https://github.com/lukecampbell these services have been returning an internal service error from the EMMA harvesting process. @amilan17 https://github.com/amilan17 what is the internal server error that you noted that @lukecampbell https://github.com/lukecampbell need to update for these to be harvested? This might be related to the ioos/catalog connect catalog to GliderDAC 2.0 (dev version) #138 harvesting issue?

These are the services:

http://tds.gliders.ioos.us/thredds/usf/usfbass-20140303T1600/catalog.xml http://tds.gliders.ioos.us/thredds/mkhoward/dora-20140625T0000/catalog.xml

http://tds.gliders.ioos.us/thredds/mkhoward/unit_308_201406191000/catalog.xml

http://tds.gliders.ioos.us/thredds/mkhoward/reville-20140619T0000/catalog.xml

http://tds.gliders.ioos.us/thredds/mkhoward/unit_202_201406181000/catalog.xml

— Reply to this email directly or view it on GitHub https://github.com/ioos/registry/issues/14#issuecomment-57333537.

lukecampbell commented 10 years ago

After some digging, that dataset has no datasets. So the deployment was conceptually created but no data was provided.

$ find usfbass-20140303T1600/
usfbass-20140303T1600/
usfbass-20140303T1600/mission.json
usfbass-20140303T1600/wmoid.txt
lukecampbell commented 10 years ago

Same with the other datasets you mentioned

[mkhoward]$ find ./
./
./dora-20140625T0000
./dora-20140625T0000/mission.json
./dora-20140625T0000/wmoid.txt
./unit_308_201406191000
./unit_308_201406191000/mission.json
./unit_308_201406191000/wmoid.txt
./reville-20140619T0000
./reville-20140619T0000/mission.json
./reville-20140619T0000/wmoid.txt
./unit_202_201406181000
./unit_202_201406181000/mission.json
./unit_202_201406181000/wmoid.txt
robragsdale commented 10 years ago

@mkhoward @felimongayanilo these service endpoints are returning a Status 500 - Internal Server Error and not datasets. Have these service endpoints changed or is there a server issue on your end that we should know about?

mkhoward commented 10 years ago

Yeah. Unfortunately we were 0 and 4 for that group of deployments.

307 and Revile - wouldn’t dive due to a firmware error 202 was captured after 1 dive by a fisherman - who broke it open. Dora - one of it’s instruments flooded - had to abort the mission

On Sep 30, 2014, at 12:36 PM, Luke Campbell notifications@github.com wrote:

Same with the other datasets you mentioned

(gliderweb)[gliderweb@tds mkhoward]$ find ./ ./ ./dora-20140625T0000 ./dora-20140625T0000/mission.json ./dora-20140625T0000/wmoid.txt ./unit_308_201406191000 ./unit_308_201406191000/mission.json ./unit_308_201406191000/wmoid.txt ./reville-20140619T0000 ./reville-20140619T0000/mission.json ./reville-20140619T0000/wmoid.txt ./unit_202_201406181000 ./unit_202_201406181000/mission.json ./unit_202_201406181000/wmoid.txt — Reply to this email directly or view it on GitHub.

+---------------------------------------------------------------------------------------------------+ | Dr. Matthew K. Howard Research Scientist | | Department of Oceanography Voice: (979)-862-4169 | | Texas A&M University FAX: (979)-847-8879 | | College Station, TX 77843-3146 Mobile: (979)-696-2026 | | http://gcoos.org mkhoward@tamu.edu | +---------------------------------------------------------------------------------------------------+

robragsdale commented 10 years ago

Sorry to hear that, @mkhoward . As far as the service endpoints go, though, I will updates these 'for removal' in the registry. Is that ok?

mkhoward commented 10 years ago

Yes - Please. If I could have deleted them myself - I would have. I know it only takes an email to John Kerfoot - but I was at sea at the time and it was overtaken by events.

On Sep 30, 2014, at 1:44 PM, robragsdale notifications@github.com wrote:

Sorry to hear that, @mkhoward . As far as the service endpoints go, though, I will updates these 'for removal' in the registry. Is that ok?

— Reply to this email directly or view it on GitHub.

+---------------------------------------------------------------------------------------------------+ | Dr. Matthew K. Howard Research Scientist | | Department of Oceanography Voice: (979)-862-4169 | | Texas A&M University FAX: (979)-847-8879 | | College Station, TX 77843-3146 Mobile: (979)-696-2026 | | http://gcoos.org mkhoward@tamu.edu | +---------------------------------------------------------------------------------------------------+

mkhoward commented 10 years ago

Rob,

The host computer barataria.tamu.edu had a severe hardware failure early last week. A solid state drive melted down (“The magic smoke got out”). It burned some traces on the motherboard. Both have been replaced. Restore operations are underway. I expect things to be back to normal in a few days.

Best,

Matt

On Sep 30, 2014, at 1:05 PM, robragsdale notifications@github.com wrote:

@mkhoward @felimongayanilo these service endpoints are returning a Status 500 - Internal Server Error and not datasets. Have these service endpoints changed or is there a server issue on your end that we should know about?

— Reply to this email directly or view it on GitHub.

+---------------------------------------------------------------------------------------------------+ | Dr. Matthew K. Howard Research Scientist | | Department of Oceanography Voice: (979)-862-4169 | | Texas A&M University FAX: (979)-847-8879 | | College Station, TX 77843-3146 Mobile: (979)-696-2026 | | http://gcoos.org mkhoward@tamu.edu | +---------------------------------------------------------------------------------------------------+

mkhoward commented 10 years ago

Yeah. We were 0 for 4 on that series of deployments. 308 and Revellie failed due to firmware error, 202 captured by fisherman after one dive, and one of Dora's instrument packaged flooded.

Phone sent

On Sep 30, 2014, at 12:36 PM, Luke Campbell notifications@github.com wrote:

Same with the other datasets you mentioned

(gliderweb)[gliderweb@tds mkhoward]$ find ./ ./ ./dora-20140625T0000 ./dora-20140625T0000/mission.json ./dora-20140625T0000/wmoid.txt ./unit_308_201406191000 ./unit_308_201406191000/mission.json ./unit_308_201406191000/wmoid.txt ./reville-20140619T0000 ./reville-20140619T0000/mission.json ./reville-20140619T0000/wmoid.txt ./unit_202_201406181000 ./unit_202_201406181000/mission.json ./unit_202_201406181000/wmoid.txt — Reply to this email directly or view it on GitHub.

rsignell-usgs commented 10 years ago

@mkhoward, Is barataria back online now?

mkhoward commented 10 years ago

Last time I checked - yesterday, machine yes, TDS no. I asked Steve to inform me when it was up.

Phone sent

On Oct 9, 2014, at 12:59 PM, Rich Signell notifications@github.com wrote:

@mkhoward, Is barataria back online now?

— Reply to this email directly or view it on GitHub.

mkhoward commented 10 years ago

Rich,

It lives.

You said you prefer a WAF to an aggregation endpoint - is that still the case?

But the other problem is that we should be harvesting the endpoint for the aggregated data, not the granule datasets.

Can you please provide the THREDDS catalog link that contains the aggregated data? Or better still, create a WAF of ISO metadata that NGDC can harvest?

On Oct 9, 2014, at 12:59 PM, Rich Signell notifications@github.com wrote:

@mkhoward, Is barataria back online now?

— Reply to this email directly or view it on GitHub.

+---------------------------------------------------------------------------------------------------+ | Dr. Matthew K. Howard Research Scientist | | Department of Oceanography Voice: (979)-862-4169 | | Texas A&M University FAX: (979)-847-8879 | | College Station, TX 77843-3146 Mobile: (979)-696-2026 | | http://gcoos.org mkhoward@tamu.edu | +---------------------------------------------------------------------------------------------------+

mkhoward commented 10 years ago

Rich,

I just noticed your previous email had a different URL than I see currently. Currently: http://barataria.tamu.edu/thredds/catalog/nam_gom_monthly/catalog.html

I wrote a little script to query the NGDC CSW for all the OPeNDAP endpoints and found 2785 links but 1100 of them either timed out after 2 seconds or gave 404 errors.

The bad ones are here:https://github.com/rsignell-usgs/system-test/blob/master/Theme_1_Baseline/bad.csv

@mkhoward, about 500 of these bad links are from tamu, and looks like: 'http://barataria.tamu.edu/thredds/dodsC/nam_gom_monthly/vgrd/nam_vgrd_gom_201312.nc' < Stale? 'http://barataria.tamu.edu/thredds/dodsC/nam_gom_monthly/dswrf/nam_dswrf_gom_200901.nc' < Stale?

So the immediate problem is that these are timing out -- it looks like the THREDDS server on barataria: http://barataria.tamu.edu/thredds is down.

But the other problem is that we should be harvesting the endpoint for the aggregated data, not the granule datasets.

Can you please provide the THREDDS catalog link that contains the aggregated data? Or better still, create a WAF of ISO metadata that NGDC can harvest?

Thanks, Rich

— Reply to this email directly or view it on GitHub.

On Oct 9, 2014, at 12:59 PM, Rich Signell notifications@github.com wrote:

@mkhoward, Is barataria back online now?

— Reply to this email directly or view it on GitHub.

+---------------------------------------------------------------------------------------------------+ | Dr. Matthew K. Howard Research Scientist | | Department of Oceanography Voice: (979)-862-4169 | | Texas A&M University FAX: (979)-847-8879 | | College Station, TX 77843-3146 Mobile: (979)-696-2026 | | http://gcoos.org mkhoward@tamu.edu | +---------------------------------------------------------------------------------------------------+