Closed kwilcox closed 4 years ago
@kwilcox , fantastic! We need this kind of feedback to providers so we can make the catalog even better!
@kwilcox. The Realtime WAF was created quite a few years ago specifically for the Sensor Map https://sensors.ioos.us/# (in detailed consultation with Axiom) since the scrapper/crawler was overwhelming our THREDDS requesting observations from historical buoys no longer deployed. I'd be glad to remove that WAF if the sensor map harvester no longer requires it.
@ebridger That makes sense. You have a separate set of files you setup for the Realtime
data access so the HistoricRealtime
datasets was not overloaded. I don't recall this conversation but if the data is showing up and you are happy with it I'm not going to open that discussion back up!
I see (2) records with the same ERDDAP endpoint (probably a different issue) and (1) record that is the HistoricRealtime
THREDDS endpoint. The Realtime
dataset isn't in the catalog (or I can't find it), most likely due to the conflict in fileIdentifier
. You could change the id
of the real-time only files to be unique but it's probably not a huge deal if you really only want the HistoricRealtime
dataset in there.
We definitely want the Realtime
WAF ingested, right?
It makes sense to change the id
for the Realtime
ISO records so they don't conflict with the HistoricalRealtime
ISO records.
@kwilcox thanks for the report!
BTW, for @tslawecki @brianmckenna and @ebridger, the place to find issues like the fileIdentifier conflicts Kyle mentions is in the Harvest Registry, click on the 'View CKAN Job Status' button:
This is the only place where we can report fileIdentifier conflicts, as the Registry will accept them, but CKAN will not. Anything that shows up in this list as an error does not make it to Catalog.
I decided to keep the Realtime WAF. One issue is that the id
is a NetCDF global attribute and the historical realtime aggregations are really 2 files, the historical file and the latest realtime deployment file. Theid
is the same in both files. The realtime THREDDS catalog only references the realtime files. So the fix was to use ncml only in the realtime catalog to override 'id' global attribute by appending -realtime
to the id value. I've regenerated the WAF. Not sure if I need to force a catalog re-harvest or if the catalog will pick it up automatically.
MARACOOS WAF has been updated. Should see unique IDs soon.
This appears to be fixed for MARACOOS.
@kwilcox Any chance you can confirm easily if this has been resolved (at least for NERACOOS and MARACOOS). Not sure GLOS' status.
I had a nice little script that tested all of this but I can't find it... so no, I can't easily confirm, sorry!
I'm crafting an updated release, so I'm going to move this into the next milestone.
I think we can close this one out at long last. If there are still issues that come up, we'll deal with them as they come up.
I did a little analysis on all of the ISO files in the registry and found a few issues. These are mostly related to RAs assigning dataset ids incorrectly. For ISO generated through THREDDS, the
fileIdentifier
in the ISO record is taken by combining thenaming_authority
and theid
global attributes from the dataset.GLOS @tslawecki
glob_habs_lakes_ysi_*
- It looks like theid
does not actually ID the dataset but rather the platform so there are conflicts with already existing datasets.The
id
in the individual satellite datasets don't specify the lake so they all conflict with each other. For exampleLakeHuronCDOM-Agg
andLakeSuperiorCDOM-Agg
both have theid
ofGLOS:modis.cdom
.MARACOOS @brianmckenna
Many of the satellite datasets suffer from the same issue as described for GLOS. For example, both
AVHRR.2012.7Agg.xml
andAVHRR.2013.7Agg.xml
end up with thefileIdentifier
oforg.maracoos:avhrr.sst
.NERACOOS @ebridger
There is a conflict between the Realtime and the Historic Realtime datasets
fileIdentifiers
. I can see how this would be done on purpose but if that was the case is there a reason to have both the Realtime and the Historic Realtime in the WAF? For example, these two ISO files have the samefileIdentifier
: