ioos / ioosngdac

IOOS National Glider Data Assembly Center (V2)
https://ioos.github.io/ioosngdac/
8 stars 18 forks source link

recurring dataset update issues #140

Closed kerfoot closed 1 year ago

kerfoot commented 6 years ago

We're still having significant problems with datasets not updating. I have been pushing NetCDFs for this deployment:

https://data.ioos.us/gliders/providers/users/rutgers/deployment/5a7b728f98723c1ae33c91ce

over the last 7+ days, yet the dataset has not updated since 3/8:

https://data.ioos.us/gliders/erddap/tabledap/cp_336-20180126T0000.html

benjwadams commented 6 years ago

This one is kind of odd. I reverted to a few commits back and the scripts are pulling in the latest data to the public ERDDAP directory:

ncdump -i -v time /data/data/pub_erddap/rutgers/cp_336-20180126T0000/cp_336-20180126T0000.ncCF.nc3.nc | tail
    "2018-03-19T02:26:20.675870", "2018-03-19T02:42:30.597170",
    "2018-03-19T03:00:33.609680", "2018-03-19T03:19:45.513090",
    "2018-03-19T03:33:47.892210", "2018-03-19T03:50:59.275790",
    "2018-03-19T04:08:1.411220", "2018-03-19T04:23:13.016140",
    "2018-03-19T04:42:15.904540", "2018-03-19T04:51:22.027130",
    "2018-03-19T05:00:24.615540", "2018-03-19T05:10:30.947600",
    "2018-03-19T05:20:34.256200", "2018-03-19T05:31:41.556820",
    "2018-03-19T05:41:44.605010", "2018-03-19T05:52:52.658200",
    "2018-03-19T06:04:56.718690" ;
}

ls -lh /data/data/pub_erddap/rutgers/cp_336-20180126T0000/cp_336-20180126T0000.ncCF.nc3.nc
-rw-rw-r-- 1 glider glider 4.5M Mar 19 17:14 /data/data/pub_erddap/rutgers/cp_336-20180126T0000/cp_336-20180126T0000.ncCF.nc3.nc

So the files are actually being updated, but the data updates do not appear to be reflected in ERDDAP. I checked the logs and didn't find anything unusual. Also created a flag file to kick off a refresh of the data in ERDDAP, but it doesn't seem to be picking up the updates. I ran a query between the 10th to the 19th and ERDDAP returns a 500 error indicating that the data requested was outside of the valid range for the time variable.

Bobfrat commented 6 years ago

Seems to be updated on thredds too https://data.ioos.us/gliders//thredds/dodsC/deployments/rutgers/cp_336-20180126T0000/cp_336-20180126T0000.nc3.nc.html

kerfoot commented 6 years ago

We may need to take a step back and re-evaluate the dataset population:

https://data.ioos.us/gliders/status/

There are 166 missing datasets as of this morning.

kerfoot commented 6 years ago

Looks like an issue with datasets.xml:

ERROR while processing line #60791 datasets.xml: see log.txt for details.

Bobfrat commented 6 years ago

Found errors in build_erddap_catalog script that was preventing datasets.xml from being properly built https://github.com/ioos/glider-dac/pull/127

kerfoot commented 6 years ago

@benjwadams

I pushed new files for this deployment:

cp_339-20180126T0000

9 hours ago and the ERDDAP dataset has not yet updated:

ERDDAP search

which means the data did not go to GTS according to the daily NDBC email.

Any ideas?

kwilcox commented 6 years ago

Any updates on this? The recent pelagia-20180401T0000 mission experienced very similar things. It is all OK now and contains the entire deployment but at some points it was days behind the real-time data.

benjwadams commented 6 years ago

@kwilcox, could you give a time frame on when this occurred? I recently moved over a lot of the dataset syncing code over to asynchronous implementation in order to speed up how often the files get pushed to the main erddap and THREDDS instances.

kwilcox commented 6 years ago

I noticed the problem on April 12th and it didn't recover for a few days.

kerfoot commented 1 year ago

Data are now updated using ERDDAP's flagging mechanism. Closing