usgs-makerspace / makerspace-sandbox

Some initial R code for playing with data processing (maybe some light visualization).
Other
0 stars 5 forks source link

Set up a tier to pick up the 'dev' dataset from the onhm pipeline #303

Open mhines-usgs opened 4 years ago

mhines-usgs commented 4 years ago

Steve Markstrom wants to iterate on the data coming out of their model.

Ivan cut a release of the current model and this is what is currently powering all of our tile generation.

He is now making a copy of this process that will be used by Steve Markstrom et al to iterate on the data.

Operational NHM I am creating a development set of pipeline transformation jobs that will use the latest available Docker images to perform the transformation (unlike the current pipeline which is using the latest tagged versions of the Docker images). This pipeline will dump the output data to s3://owi-common-resources/resources/application/nhm/dev/data (currently the output data gets dumped to s3://owi-common-resources/resources/application/nhm/data)

Testing out the dev pipeline now. Hopefully will have it available by end of day.

https://jenkins.wma.chs.usgs.gov/job/NHM/view/Development/
<https://teams.microsoft.com/l/message/19:c504076d1ccb4f9c877308fdaca60893@thread.skype/1575908825884?tenantId=0693b5ba-4b18-4d7b-9341-f32f400a5494&amp;groupId=39751d35-3a2c-4e8a-b2ad-be14bb4dbd1b&amp;parentMessageId=1575556804895&amp;teamName=WBEEP&amp;channelName=Operational NHM&amp;createdTime=1575908825884>

Lindsay suggested we may also want to consider setting up a different tier to display this, or perhaps we use QA?

I am wondering if we want another tier for when we do it .... I think the idea is that it is a place for the model to be tested. So we need a data-test tier. Perhaps it has the QA version of our tier so that features aren't changing out from under them? Worth a discussion
abriggs-usgs commented 4 years ago

I just want to reiterate that the Vue Application of the WBEEP project is completely decoupled from the data generation and resulting tiles. Again, we are hitting this problem of having intermingling of project concerns because they just happen to be stored in the same bucket.

If the desire is to have the ability to view other sets of tiles from the 'test' (or other) tier (without rebuilding the application, as is currently possible) this can be done by adding a source toggle switch to, I would suggest, the 'qa' version of the application, that will allow users to switch the tile source on the fly. I am guessing that it would also be possible to swap in and out map style sheets in a similar manner.

wdwatkins commented 4 years ago

I think we should either create a new tier for the pipeline, or else be able to toggle an existing tier between them. We don't want to permanently switch an existing tier to the dev model pipeline, since that would potentially introduce additional variables to our existing setup, besides changes to our code base.

lindsayplatt commented 4 years ago

This is resurfacing because they are working on switching the pipeline to using the new geospatial fabric version and we will need to get our tiles built using the new geospatial fabric (see this SB object) before moving things out to prod. I believe @wdwatkins has had recent experience with the new GF version.

My suggestion is to create a new viz tier called dev-model or something to do this kind of work on. I wouldn't mind a quick conversation early next week once David is back on this topic (perhaps we can stay on the standup line for an extra few minutes Monday).

wdwatkins commented 4 years ago

I have only used the flowlines, not the actual HRUs, but seeing the difference between the flowlines versions I don't think many changes to our codebase should be required, to use the new version, hopefully just changing the field names

mhines-usgs commented 4 years ago

The only things that ring in my ears, are that we eliminated some HRUs because they didn't have data in the tile-join process, not sure if we will need to revisit that. here is the line that does the exclusion: https://github.com/usgs-makerspace/wbeep-processing/blob/master/jenkins/tippecanoe_tile_join_Jenkinsfile#L48

wdwatkins commented 4 years ago

Yeah, even if those were due to lack of input data rather than the fabric, maybe the HRU ids will have changed?

mhines-usgs commented 4 years ago

Tried to grab the GF*. file to compare and it doesn't appear to work anymore, i end up with an error page image

lindsayplatt commented 4 years ago

That's normally not the error you get if it is private though

lindsayplatt commented 4 years ago

Can you try again @mhines-usgs? I am not logged in and can see it:

https://www.sciencebase.gov/catalog/item/5e29d1a0e4b0a79317cf7f63

image

mhines-usgs commented 4 years ago

Yeah, I could see that item just fine, it was the 932mb zip file I was trying to download below that was triggering that error. Just now it seems to be working, though!

image

lindsayplatt commented 4 years ago

Oh gotcha. I wondering if they were updating or something? Glad it is working now.

mhines-usgs commented 4 years ago

I think what we'll be doing is replacing the geodatabase that we start our tile generation pipeline with, and then provide a completely separate tier set up where they can look at their dataset on the other end. In order to support that, I think we need to...

for wbeep processing:

for wbeep viz:

amrhoades commented 4 years ago

Update on this issue: I spoke with Jen Rapp about this and we decided it is a nice-to-have for this FY. It is still important to deliver, but does not need to be constrained by the FY end.

jenniferRapp commented 4 years ago

Let's review this issue and determine if it should be on the current task list or if the one I started works instead.