internetofwater / geoconnex.us

URI registry for https://geoconnex.us based URIs
Other
24 stars 16 forks source link

Western States Water Council Water Data Exchange RE-presentations of Points of Diversion #38

Closed ksonda closed 4 years ago

ksonda commented 4 years ago

I've talked with @amabdallah about on a trial basis adding about 1,000,000 Points of Diversion in the western states. There's no LC yet, but there is data content at https://wade-api-qa.azure-api.net/v1/SiteAllocationAmounts?SiteUUID=[id]. Also have basic site metadata and lat/lon in csv/geoJSON. Since there's no web pages already for these organizational resources, is this worth doing @dblodgett-usgs ? Other alternative is have pygeoapi host them experimentally.

dblodgett-usgs commented 4 years ago

Cool! 1M is a lot. Maybe we add this to the NLDI as a discovery tool?

Just having them available to start to link to other things would be huge.

ksonda commented 4 years ago

I think eventually it will hit 3M. Are you saying just mint PIDs directing to the WaDE API calls for now?

What would adding to NLDI entail?

dblodgett-usgs commented 4 years ago

The NLDI could give us landing content for the locations associated with NHDPlusV2 COMIDs. All we need is a GeoJSON with the PID, a name, and a lat/lon location.

Getting PIDs for the locations as organizationally-oriented identifiers would be a good first step.

This is part of the barn raising I want to explore -- we'd be doing the same for Dams, gages, etc.

I need to think about how the NLDI index fits in to all this. Would rather treat it as a discovery tool than as a way to generate landing content for a feature, but the two use-cases could be combined depending how we build out a landing-content (linked data) system.

ksonda commented 4 years ago

@amabdallah please chime in if I'm missing something.

WaDE sites include points of diversion and points of use and some other types of sites important in water rights administration. Since WaDE is a quasi-authoritative aggregator of these resources that are actually associated with State legal constructs, should we think of these as Community Reference Locations suitable for the /ref/ namespace -- something like geoconnex.us/ref/waterrights/sites/[uuid]? Or should we think of it as another organizational collection at geoconnex.us/wade/sites/[uuid].

Thinking through this will be relevant for other quasi-authoritative aggregator-type organizations like CUAHSI and DOE for hydropower facilities.

amabdallah commented 4 years ago

Good point. So actually "sites" in WaDE is a pretty general concept that can be a point of diversion, point of use, or a gage station that measure seasonal flow. So it is not necessarily always under the water rights category. So I vote for the geoconnex.us/wade/sites/[uuid] idea

dblodgett-usgs commented 4 years ago

This is a tricky one. I think we should move slow and be incremental about our changes.

If we want to have a set of WaDE reference non-information resource (NIR) identifiers that 303 see other to WaDE URLs and associated landing content, then those NIRs could be minted in the reference namespace. e.g. https://geoconnex.us/ref/wade/{uuid}

These ref NIRs are intended to point to community-curated collections of features that fit in whatever thematic collection of features is worth curating. In the case of "sites" I imagine we will end up with sites that end up in multiple thematic collections. e.g. dams, bridges, monitoring, diversions could all be co-located and basically the same "site". In this case, the logic for the thematic collection is the logic that determines "what is a WaDE site". Totally valid in my mind. The tricky bit is that if some WaDE sites are also stream gages that end up in the monitoring reference set, we'll have to have some associations there -- but that's down the road and part of my suggestion that we focus on being incremental.

ksonda commented 4 years ago

To summarize some follow up discussion on ESIP slack, it seems we may want to back off of the idea of WaDE being community reference locations, since not all diversions, etc. will end up in WaDE (particularly in eastern states).

A more coherent and proper way to do this, will be to have reference location collections for https://geoconnex.us/ref/diversions, https://geoconnex.us/ref/points_of_use, etc. These will be initially seeded by what WaDE has aggregated, but the community can combine with locations not currently represented in WaDE for whatever reason.

Then, there will be the organizational WaDE namespace, where there can be https://geoconnex.us/wade/sites/[uuid] that reference the above locations.

Separately making LC for these two representations for the barnraising TBD

dblodgett-usgs commented 4 years ago

I think this makes the most sense. In the same way we will see dams with NID and have USACE pages in an organizational namespace.

ksonda commented 4 years ago

I guess to guarantee no overlapping pids it would still have to be https://geoconnex.us/ref/diversions/wade/[uuid] unless the community managing refs/diversions were super vigilant and only did 1:1 mappings.

dblodgett-usgs commented 4 years ago

The idea of the ref is that there should be some community maintaining it that takes care of de-duplication.

amabdallah commented 4 years ago

It would help if you guys can sketch a conceptual diagram that shows how this works from many different organizations and how the duplicate referencing will be handled. So a WaDE POD can be a well or reservoir which also can show up in the groundwater monitoring network or USBOR Information system. Our POD say will be focused on its water right and reported usage but USBOR can have tones of other info on that "POD". So how all different pages can be related to a single physical thing?

dblodgett-usgs commented 4 years ago

We should start some simple diagrams like you describe, @amabdallah -- but I want to be be careful we don't get ahead of ourselves here. We are trying to be incremental and try to fix small mistakes as we go.

For now, let's just mint organizational IDs for WaDE and look to build out reference features as it becomes more clear how we want to handle it for various feature types.

That way we can at least start to hook WaDE up to things like the NLDI -- and defer the harder problem of multiple information resources for the same non-information resource.

ksonda commented 4 years ago

@amabdallah See our similar discussion WaDE is one of the first experiments dealing with this that will help us figure this out.

See https://github.com/internetofwater/geoconnex.us/wiki for the basic idea. Basically Community Reference Locations (like all of the dams) will have neutral landing content with a PID like geoconnex.us/ref/dams/{dam-id} redirecting to content somewhere like https://info.geoconnex.us

Sometimes there will be instances like geoconnex.us/ref/dams/{dam-id} and geoconnex.us/ref/bridges/{bridge-id} that refer to the same real-world thing (non-information resource). These two information resources probably should somehow reference each other, although how exactly may vary as we come upon new situations.

Organization Specific Resources that are about entities that are co-located, about, or otherwise associated with a particular dam geoconnex.us/ref/dams/{dam-id} will have PIDs geoconnex.us/org/{site-id} redirect to landing content at organization.org/sites/{site-id}. organization.org/sites/{site-id} should have some kind of linked-data content therein that indicates that it is co-located with, or about geoconnex.us/ref/dams/{dam-id}

This is about as much of an idea that we have at this point. Agreed with @dblodgett-usgs that we should just start with the wade organizational PIDs and see what we can do first.

ksonda commented 4 years ago

Also, this is going to be an interesting case, bc with 1,000,000 pts WaDE should have regex redirect. But, it being the seed of ref/diversions and ref/points_of_use etc, with no discernable pattern to distinguish its UUIDs that are diversions vs other types, may want to do 1:1 redirects anyway.

dblodgett-usgs commented 4 years ago

Good summary @ksonda

Landing content for ref/points_of_use would contain links to the WaDE representations of those points of use. So both could use regex and the associations get encoded in the landing content.

e.g. https://geoconnex.us/ref/diversions/1234 -> 303 see other -> https://info.geoconnex.us/diversions/1234

And you'd get a bunch more than this but the key is the link in the below.

{
  "@id": "https://geoconnex.us/ref/diversions/1234",
  "https://schema.org/subjectOf": "https://wade-api-qa.azure-api.net/v1/SiteAllocationAmounts?SiteUUID=1234"
}
ksonda commented 4 years ago

OK. http://vocabulary.westernstateswater.org/sitetype/ @amabdallah , are these the site types that are in there? Is it possible to rationalize down to diversions and points of use at least, or at least identify a subset that are definitely diversions (well or surface)? Or is this a fool's errand and we should rethink our suggestion, and just have geoconnex.us/wade/sites/ and maybe use your data to help populate geoconnex.us/ref/wells

I'd also want to see how big of a feature collection that pygeoapi can handle if there's actually going to be info.geoconnex.us/collections/wells

amabdallah commented 4 years ago

That's right @ksonda. Those follow the states' vocabs. It's been hard to imagine dealing with those terms so looking at them now, we should revise them. It will be easier to do so once we connect all or at least most of our states by the end of year. To your suggestion, we can add another column: "SiteTypeGeoconnex" that tracks a higher-level site type e.g., PODs and POUs.

dblodgett-usgs commented 4 years ago

Let's not get carried away -- Like I said before:

I think we should move slow and be incremental about our changes.

Let's build some value on what is already there and see what unfolds.

ksonda commented 4 years ago

In this case where there’s no LC yet, would you suggest as an initial step minting PIDs that 303 to something at info.geoconnex.us/collections/wade , 303 straight to the wade api (json only content) , or something else?

dblodgett-usgs commented 4 years ago

I think the JSON-only from WaDE is much better than nothing, so would go that route. That's a similar situation to the NLDI. e.g. http://geoconnex.us/nhdplusv2/comid/13293376 gives you some useful geojson.

ksonda commented 4 years ago

OK, making that regex redirect is simple enough.

@dblodgett-usgs I suppose a next step could be seeing about indexing to NLDI as you said above. Let us know if/when you need anything for that. The WaDE API returns the lat/lon under the WaterAllocations:["Sites"] list, but it's not geojson

@amabdallah Something to keep in mind, since I noticed HUC8 and 12 aren't populated yet under Sites. may want to consider eventually populating those with links to the geoconnex PIDs for those HUCs (once they exist #37, #24) rather/in addition to the code integers.

amabdallah commented 4 years ago

@ksonda sure thing. Please keep me posted when those are ready.

dblodgett-usgs commented 4 years ago

Just need a transform of that to GeoJSON with fields that work with the NLDI crawler configuration described in the table here: https://github.com/ACWI-SSWD/nldi-crawler.

I could take that on if you like. Maybe it could get hosted as a standalone periodically-generated file along side the API?

ksonda commented 4 years ago

I think once WaDE's portal is up and running it should be maybe possible to generate an updated geoJSON periodically as they update their database.

For now, such a big geoJSON file could be provided, and hosted at info.geoconnex.us, or maybe hydroshare, or somewhere @amabdallah would prefer to set up, or somewhere else @dblodgett-usgs you would find most convenient.

ksonda commented 4 years ago

I've been toying with using pygeoapi to host LC for a dataset of WaDE's size. To be performant hosting it as 1 unified geojson, WaDE would need a VM configuration that would cost them ~ $160/month on Azure (what WaDE is currently on). Can get much better performance splitting into states, as shown at https://wade-test.geoconnex.us/collections, which is on a machine costing ~$20-30/mo on Azure

dblodgett-usgs commented 4 years ago

I wonder if a tiny PostGIS table hosted outside the VM would be more economical?

ksonda commented 4 years ago

Maybe even a separate PostGIS instance inside the same VM would work better, that's true. For WaDE it's a moot point if they eventually move to making LC themselves. info.geoconnex.us and state-split wade-test are both on the same DigitalOcean machine for now and it's pretty cheap. But probably useful to explore for /ref/ content and for guidance for orgs with large data down the line.

amabdallah commented 4 years ago

I've been toying with using pygeoapi to host LC for a dataset of WaDE's size. To be performant hosting it as 1 unified geojson, WaDE would need a VM configuration that would cost them ~ $160/month on Azure (what WaDE is currently on). Can get much better performance splitting into states, as shown at https://wade-test.geoconnex.us/collections, which is on a machine costing ~$20-30/mo on Azure

Hi @ksonda do you have some time this week for a call to catch up on this?

ksonda commented 4 years ago

@amabdallah didn't mean to cause a panic, Just seeing if pygeoapi is a viable option for landing pages for large feature collections, and the sites was a decent scale test file. But sure can talk whenever. catch up on slack.

amabdallah commented 4 years ago

No worries @ksonda and thanks for the testing. You're ahead of us. Nice work as always. Sure, let's chat on Slack this week.

ksonda commented 4 years ago

local geoJSON is also apparently the least performant way to do it. https://docs.pygeoapi.io/en/stable/code.html#module-pygeoapi.provider.geojson

I'll look into alternatives.

ksonda commented 4 years ago

BY providing as SQLiteGPKG, the thing works fine as a single table on 2GB RAM VM, and fast on a 3GB. https://wade-test.geoconnex.us/collections/WaDE

Until WaDE/ @amabdallah is ready to

  1. Add more states
  2. Segment sites into types (diversions, use, other)
  3. Host their own landing content.

I think we can close this issue.