ioos / ckanext-ioos-theme

IOOS Catalog as a CKAN extension
GNU Affero General Public License v3.0
7 stars 14 forks source link

Service Monitor - Too Many Services Down #153

Closed lukecampbell closed 6 years ago

lukecampbell commented 7 years ago

Right now the service monitor indicates well over 80% services as down, offline or errors for SOS service endpoints.

Some services are legitimately down, but I think we are creating a denial of service situation by asking for individual DescribeSensor requests for every offering.

mwengren commented 7 years ago

@lukecampbell The harvest reports on the Service Monitor are looking better. I like the color coding. Nice job.

I don't think I understand completely the harvesting workflow you set up in ioos/service-monitor#460, and that's OK, but looking this morning I noticed that each of the NDBC stations reports the same error now (one example). For instance here and here the harvest results seem identical.

The harvest should ideally be doing different things for different stations within the same service (in the case of i52N anyway, maybe not ncSOS because obviously there's only one 'station' in each ncSOS service). The GetCapabilities request will be the same, but the DescribeSensor will be different (depending on the station ID).

Is this happening currently, or does each SOS 'harvest' try to harvest each of the stations listed in the GetCaps response and then report on all stations harvested in the results page (no matter which station you click in the services list?

I'm guessing changing this would be a big refactor if I'm right about how it works. We probably want to avoid that, but maybe there's room for some small improvements...

lukecampbell commented 7 years ago

Is this happening currently, or does each SOS 'harvest' try to harvest each of the stations listed in the GetCaps response and then report on all stations harvested in the results page (no matter which station you click in the services list?

Yes that's how it works, it gets each station in the GetCaps for a service. A service used to be defined as a unique URL but we've changed that so that you see one service for each resource defined in CKAN.

It would certainly be a massive undertaking to treat each service as a 1:1 with each station and it would likely not succeed since it depends on naming conventions of providers which are reliably inconsistent.

Imagine MARACOOS publishes a fictitious 52n service with three stations. Station Alpha, Station Bravo, and Station Charlie.

In 52n they are identified urn:ioos:station:maracoos:alpha etc. But in each ISO Record the title of the station is Station Alpha in the Mid Atlantic. Even if there was a place in ISO to stick the URN identifier, I can't imagine it's very reliable to depend on based on the wide variety of ISOs I see and how they are generated.

What I can tell from that ISO record is that there is a 52n service somewhere. And what I can tell from 52n is that there are a bunch of stations, it's hard for me to associate a particular station in a GetCapabilities response from 52n to a particular ISO document and therefore to the service of an ISO document.

lukecampbell commented 7 years ago

chalk1

lukecampbell commented 7 years ago

I think we would need to redefine service, harvests and datasets to make what you describe a reality. And starting from near scratch is probably easier than refactoring the current service-monitor. It has a lot of components that are really hard for me to refactor, like it's dependency on paegan and dogma.

mwengren commented 7 years ago

Yeah, I thought it would involve a lot of changes. Just wanted to confirm. I don't think it's worth re-writing everything, most of the harvesting results look a lot better right now.

With just a few exceptions CO-OPS and NDBC.

Ideally, we could find some way to have these SOS services harvest properly, but it looks like internal errors with a few of their stations are going to prevent that from ever happening, and the counts will always be '0 of XX'. I know from writing sensorml2iso that NDBC has some DescribeSensor requests that always fail, and I'm not sure what's going on with CO-OPS and that failure message.

Probably the solution would be to reach out each of those providers to fix their services. Let's keep this open as a reminder, but are we ready make an official 'release' of this updated Monitor? Or did you make a release already this week?

lukecampbell commented 7 years ago

I could change the logic in the harvester to treat "partial" success as a 1 in terms of counting.

mwengren commented 7 years ago

No, that's OK. We should really get the underlying issue resolved with the provider. What about the release question, did we do an official 'release' this week?

If not, can you tag a new release on GitHub? That way I can say we met the May 5 deadline to update SM. Thanks.

mwengren commented 7 years ago

I see this: https://github.com/ioos/service-monitor/releases/tag/3.3.1, but there were other changes you made subsequent to that right? Can we roll those into a new release just to mark that milestone as done?

benjwadams commented 6 years ago

There was an uncaught exception in some code paths for the service monitor harvesters addressed here: https://github.com/ioos/service-monitor/pull/464

This was causing the harvesters to crash in some instances. It is now fixed, but I'll need to review this issue some more prior to addressing the problems here.

benjwadams commented 6 years ago

@mwengren, what do we want to do with this issue following the discussion on yesterday's call?

mwengren commented 6 years ago

Closing this issue as we're not going to make these changes at this point.

Related Service Monitor discussion issue: https://github.com/ioos/catalog/issues/60.