Set up a tier to pick up the 'dev' dataset from the onhm pipeline

mhines-usgs commented 4 years ago

Steve Markstrom wants to iterate on the data coming out of their model.

Ivan cut a release of the current model and this is what is currently powering all of our tile generation.

He is now making a copy of this process that will be used by Steve Markstrom et al to iterate on the data.

[ ] Decide where to put the tiles and where to display the tiles that are coming out of this dev/test data processing tier
[ ] set up needed jenkins jobs to process those new data and create s3 space to display it
[ ] add the 'experimental' model output as a source to the map. This should probably be done in the code and not the style sheet so that we can conditionally exclude it based on deployment tier.
[ ] Add the 'experimental' model as a layer
[ ] Make a toggle button (like the 'zoom level indicator') so that we can toggle the experimental layer
[ ] Hide the real 'HRU' layers when the experimental layers are on

Operational NHM I am creating a development set of pipeline transformation jobs that will use the latest available Docker images to perform the transformation (unlike the current pipeline which is using the latest tagged versions of the Docker images). This pipeline will dump the output data to s3://owi-common-resources/resources/application/nhm/dev/data (currently the output data gets dumped to s3://owi-common-resources/resources/application/nhm/data)

Testing out the dev pipeline now. Hopefully will have it available by end of day.

https://jenkins.wma.chs.usgs.gov/job/NHM/view/Development/
<https://teams.microsoft.com/l/message/19:c504076d1ccb4f9c877308fdaca60893@thread.skype/1575908825884?tenantId=0693b5ba-4b18-4d7b-9341-f32f400a5494&amp;groupId=39751d35-3a2c-4e8a-b2ad-be14bb4dbd1b&amp;parentMessageId=1575556804895&amp;teamName=WBEEP&amp;channelName=Operational NHM&amp;createdTime=1575908825884>

Lindsay suggested we may also want to consider setting up a different tier to display this, or perhaps we use QA?

I am wondering if we want another tier for when we do it .... I think the idea is that it is a place for the model to be tested. So we need a data-test tier. Perhaps it has the QA version of our tier so that features aren't changing out from under them? Worth a discussion

abriggs-usgs commented 4 years ago

I just want to reiterate that the Vue Application of the WBEEP project is completely decoupled from the data generation and resulting tiles. Again, we are hitting this problem of having intermingling of project concerns because they just happen to be stored in the same bucket.

If the desire is to have the ability to view other sets of tiles from the 'test' (or other) tier (without rebuilding the application, as is currently possible) this can be done by adding a source toggle switch to, I would suggest, the 'qa' version of the application, that will allow users to switch the tile source on the fly. I am guessing that it would also be possible to swap in and out map style sheets in a similar manner.

wdwatkins commented 4 years ago

I think we should either create a new tier for the pipeline, or else be able to toggle an existing tier between them. We don't want to permanently switch an existing tier to the dev model pipeline, since that would potentially introduce additional variables to our existing setup, besides changes to our code base.

lindsayplatt commented 4 years ago

This is resurfacing because they are working on switching the pipeline to using the new geospatial fabric version and we will need to get our tiles built using the new geospatial fabric (see this SB object) before moving things out to prod. I believe @wdwatkins has had recent experience with the new GF version.

My suggestion is to create a new viz tier called dev-model or something to do this kind of work on. I wouldn't mind a quick conversation early next week once David is back on this topic (perhaps we can stay on the standup line for an extra few minutes Monday).

wdwatkins commented 4 years ago

I have only used the flowlines, not the actual HRUs, but seeing the difference between the flowlines versions I don't think many changes to our codebase should be required, to use the new version, hopefully just changing the field names

mhines-usgs commented 4 years ago

The only things that ring in my ears, are that we eliminated some HRUs because they didn't have data in the tile-join process, not sure if we will need to revisit that. here is the line that does the exclusion: https://github.com/usgs-makerspace/wbeep-processing/blob/master/jenkins/tippecanoe_tile_join_Jenkinsfile#L48

wdwatkins commented 4 years ago

Yeah, even if those were due to lack of input data rather than the fabric, maybe the HRU ids will have changed?

mhines-usgs commented 4 years ago

Tried to grab the GF*. file to compare and it doesn't appear to work anymore, i end up with an error page

lindsayplatt commented 4 years ago

That's normally not the error you get if it is private though

lindsayplatt commented 4 years ago

Can you try again @mhines-usgs? I am not logged in and can see it:

https://www.sciencebase.gov/catalog/item/5e29d1a0e4b0a79317cf7f63

mhines-usgs commented 4 years ago

Yeah, I could see that item just fine, it was the 932mb zip file I was trying to download below that was triggering that error. Just now it seems to be working, though!

lindsayplatt commented 4 years ago

Oh gotcha. I wondering if they were updating or something? Glad it is working now.

mhines-usgs commented 4 years ago

I think what we'll be doing is replacing the geodatabase that we start our tile generation pipeline with, and then provide a completely separate tier set up where they can look at their dataset on the other end. In order to support that, I think we need to...

for wbeep processing:

[ ] upload the new geodatabase into s3 location where jenkinsfile can grab it
[ ] create a new version or parameterized version of this jenkins file that directly handles the new geodatabase or allows a user to pick which geodatabase version to use
[ ] add new tier to all the downstream jenkinsfiles here jenkinsfiles perhaps called test-data
[ ] modify the s3 target domain logic in final tile join jenkinsfile to allow for new test-data s3 output location (assuming we aren't creating a new s3 wbeep bucket like this) - can we use new water-visualizations-test-website space?
[ ] create new s3 spaces for test-data (many locations for tiles and intermediate outputs from pipeline)
[ ] verify if any logic used to exclude hrus from final output are still accurate or if any ids have changed that might alter or cause logic to fail for excluding PR and HI and other HRUs we excluded
[ ] run whole pipeline and push final tiles out to water-visualizations-test s3 bucket (if that's the accurate place)

for wbeep viz:

[ ] set up new env file for wbeep viz for this new tier and set it to a space in the water-visualizations-test s3 bucket
[ ] run a build for that tier

amrhoades commented 4 years ago

Update on this issue: I spoke with Jen Rapp about this and we decided it is a nice-to-have for this FY. It is still important to deliver, but does not need to be constrained by the FY end.

jenniferRapp commented 4 years ago

Let's review this issue and determine if it should be on the current task list or if the one I started works instead.

usgs-makerspace / makerspace-sandbox

Set up a tier to pick up the 'dev' dataset from the onhm pipeline #303