ropensci / unconf17

Website for 2017 rOpenSci Unconf
http://unconf17.ropensci.org
64 stars 12 forks source link

Minnesota Lakefinder #42

Open hrbrmstr opened 7 years ago

hrbrmstr commented 7 years ago

http://www.dnr.state.mn.us/lakefind/index.html

There have been a few SO questions (btw: I don't think that search result is comprehensive but it's indicative) that need to get to the underlying, heavily nested JSON result.

Might be worth a pkg attempt. I'm not smart enough in the underlying data to know what to do on my own with it (I'd be making too many assumptions and not making the right connections/labels).

jsta commented 7 years ago

Sounds great! I could contribute some domain knowledge on this (albeit a little light on the fisheries related issues). I wonder if a major outcome of this effort could be a detailed description of the development process so that people could write packages for the many similar database interfaces for other areas (I maintain a list with some at: https://jsta.github.io/limnology_models_data/). I am thinking less of "use this package" and more of "here's how we found the api endpoint + parameters" and "here's how you know that selenium is required".

hrbrmstr commented 7 years ago

IMO that would be a superb resource for folks (that's a nice list of other databases, too).

On Tue, Apr 18, 2017 at 10:37 AM, Joseph Stachelek <notifications@github.com

wrote:

Sounds great! I could contribute some domain knowledge on this (albeit a little light on the fisheries related issues). I wonder if a major outcome of this effort could be a detailed description of the development process so that people could write packages for the many similar database interfaces for other areas (I maintain a list with some at: https://jsta.github.io/limnology_models_data/). I am thinking less of "use this package" and more of "here's how we found the api endpoint + parameters" and "here's how you know that selenium is required".

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ropensci/unconf17/issues/42#issuecomment-294865043, or mute the thread https://github.com/notifications/unsubscribe-auth/AAfHthP41PvdRcu3WNZ4BD580nc80vwHks5rxMqRgaJpZM4NAVoA .

jhollist commented 7 years ago

While I won't be there, I was planning on blocking off the 25th and 26th so that I can follow along remotely! Be very interested in what you come up with here and happy to contribute.

And nice list, @jsta! One thing that is becoming apparent (at least to me) is that a harmonized lakes database (at least for US, but also Canada) would be great. There are many folks working in similar directions (EPA, USGS, you and Patricia and others...). Lot of really cool things could happen if a National (North American) lakes database would come to pass. But I digress...

karthik commented 7 years ago

@jhollist We'd love to have you there remotely. I'm guessing you're already on our Slack, and hopefully the team that you join can also have you connected by voice/video for at least part of it. You might ping Nick Tierney/Miles McBain to see how they pulled it off last year as part of Bob's team.

jhollist commented 7 years ago

Thanks @karthik! I will ping them and work with @hrbrmstr, @jsta, or others (interested in a lot of the issues e.g. #5 ) on best way to get looped in. One of these years I'll throw my hat in the ring to hopefully attend in person!

karthik commented 7 years ago

One of these years I'll throw my hat in the ring to hopefully attend in person!

You should and we'd be delighted to have you in person!

stefaniebutland commented 7 years ago

@jhollist Let me know if there's anything I can do to help you work remotely. As rOpenSci's community manager my unconf role will be 100% facilitation.

Nick Tierney said his main barrier was just Australian time zone. Group had meetings as needed via https://appear.in.

jhollist commented 7 years ago

@stefaniebutland Thanks! My plan at this point is to follow along via slack (although I need to track down my 2fa codes, b/c my authenticator isn't working ...) and GitHub. Thanks for the link to appear.in. That will be useful. If I have any other issues, I will let you know.

bhive01 commented 7 years ago

I will also not be there, but I am most curious about this particular issue. Not because of the JSON, but because the dataset is interesting and I'm trying to learn new things. If this one goes forward or not, I'd like to try and participate in it as well. @stefaniebutland Is there an runconf17 Slack channel I need to join? I'm in General and Random thanks to @sckott

jhollist commented 7 years ago

@hrbrmstr and @jsta If this gets any traction on Thursday, do hit me up on slack or twitter. I'll be following along 11-4:30 EDT and can hope on appear.in if a chat makes sense. Like @bhive01 I am interested in helping and especially so with anything lake related!

hrbrmstr commented 7 years ago

Will do. I'm super interested to see how Thu will go :-)

On Tue, May 23, 2017 at 7:32 PM, Jeffrey W Hollister < notifications@github.com> wrote:

@hrbrmstr https://github.com/hrbrmstr and @jsta https://github.com/jsta If this gets any traction on Thursday, do hit me up on slack or twitter. I'll be following along 11-4:30 EDT and can hope on appear.in if a chat makes sense. Like @bhive01 https://github.com/bhive01 I am interested in helping and especially so with anything lake related!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/unconf17/issues/42#issuecomment-303566783, or mute the thread https://github.com/notifications/unsubscribe-auth/AAfHttPF98euxTWT4ZZLPP5hH9gJLfQeks5r82yWgaJpZM4NAVoA .

jsta commented 7 years ago

I took a look at the structure of the query results. You were not kidding about the nestedness. It makes sense to me to return results for a single lake as a single list of data frames. For example, a query like lakefinder_get(lake = "56011602") would return a list object with the following structure:

|__characteristics
    |__name
    |__id
    |__max_depth
    |__...
|__surveys
    |__id
    |__date
    |__quartile
    |__cpue
    |__species
    |__length
    |__...

It is not clear to me without further digging which columns in the survey object represent derived quantities versus unique data. For example, it seems that maximum_length and minimum_length are derived from fishCount. Is quartileCount also derived from fishCount? It seems like quartileWeight is unique (not derived) as there is no fishWeight column.