internetofwater / ref_dams

Reference Dams
Creative Commons Zero v1.0 Universal
1 stars 1 forks source link

Plan incorporation of https://github.com/mikejohnson51/ref_dams #16

Open dblodgett-usgs opened 6 months ago

dblodgett-usgs commented 6 months ago

@mikejohnson51 -- let's walk through how to get your work incorporated here. It may be a complete rewrite, but I want to keep the registry intact and some aspects of the overall workflow. I'll do #15 and tidy things up before we get started.

dblodgett-usgs commented 6 months ago

Currently, the workflow looks like this:

There is a todo to use the nhd flowlines to do linear referencing -- that's why there's a disconnected data load.

graph LR
  style Graph fill:#FFFFFF00,stroke:#000000;
  subgraph Graph
    direction LR
    x45079be5e9dad71e(["nat_db"]):::none --> x0d37c0787182d513(["nhdpv2_fline"]):::none
    x26c28590358ed1f6(["nid_gpkg"]):::none --> xde13d5897adf176a(["nid"]):::none
    xa0ffbbb184c91736(["nid_meta"]):::none --> xde13d5897adf176a(["nid"]):::none
    x5a093a06e6e82fb6(["dams"]):::none --> x367057a71281c312(["dam_locations"]):::none
    xde13d5897adf176a(["nid"]):::none --> x367057a71281c312(["dam_locations"]):::none
    x7b8f98c0b818a18a(["registry"]):::none --> x164a1fd6fa5b800d(["registry_out"]):::none
    x367057a71281c312(["dam_locations"]):::none --> x2dd2f719129b7fe6(["reference_out"]):::none
    x2ed57357c0c06777(["providers"]):::none --> x2dd2f719129b7fe6(["reference_out"]):::none
    x7b8f98c0b818a18a(["registry"]):::none --> x2dd2f719129b7fe6(["reference_out"]):::none
    x0d37c0787182d513(["nhdpv2_fline"]):::none --> x6c59ac2b7db5bcf8(["nhdpv2_fline_proc"]):::none
    x367057a71281c312(["dam_locations"]):::none --> x7b8f98c0b818a18a(["registry"]):::none
    x2ed57357c0c06777(["providers"]):::none --> x7b8f98c0b818a18a(["registry"]):::none
    x60fc93676537b647(["providers_csv"]):::none --> x2ed57357c0c06777(["providers"]):::none
  end
  classDef none stroke:#000000,color:#ffffff,fill:#7500D1;
mikejohnson51 commented 6 months ago

Can you use the reference_flowlines? My plan was to use the outputs of this workflow; add any updates from the latest NID release (assuming the nid_gpkg stays 2019?), then run and refine the full conflation algorithm we have on it, using targets, and updated input sources. Does that sound ideal to you?

dblodgett-usgs commented 6 months ago

Yeah -- switching to reference_flowlines shouldn't be a problem.

I think this does sound like the right plan. Can we open up a stepwise set of issues that you can close as you work through it?

Thinking:

  1. update data access to incorporate new data sources
  2. incorporate core conflation algorithm and get running end to end
  3. refactor / document code and workflow files
  4. finalize and output release artifacts
  5. update readme and other documentation
  6. cut release and publish final artifacts

That make sense for the basic flow of the issues to tick off? Would it make sense to split 2 into a few pieces? Would be good to get a small review in between each step.

mikejohnson51 commented 6 months ago

I think breaking (2) into series of a steps for each data source using Geoconnex inputs and uris where possible (may not need a review on each step) and then an additional step for ranking and qualifying/quantifying the agreement and final location would be good. Otherwise this looks great. You want me to take a stab at writing out a more granular process?

dblodgett-usgs commented 6 months ago

Whenever you are ready to start digging into the work, let's break this all out into stand along issues.