kunstler / laselva

Fetch Forest Inventory data from many countries
https://docs.ropensci.org/laselva
Other
9 stars 4 forks source link

feedback? #3

Closed sckott closed 4 years ago

sckott commented 8 years ago

@taddallas did you try it yet? let me know any thoughts

sckott commented 8 years ago

also possibly of interest to:

@dlebauer

dlebauer commented 8 years ago

Thanks. Brings back fond memories! What is the scope of the package - it looks like it imports data from FIA, but its unclear why the name is 'laselva'. Is it to compile a local instance of the FIA database?

@mdietze, @robkooper, @colinaverill, @rykelly are all actively working with FIA and a few others within @PecanProject/pecan-developers : do any of you have feedback on an R package that makes using FIA easier?

sckott commented 8 years ago

I know, starting learning R in '06 working with FIA data!

What is the scope of the package

@taddallas wanted to more easily get FIA data files, - Scope I think is finding out what files available and fetching data files. Scope for making it easy to combine them as well. Perhaps doing some transformations on variables, adding new variables - To be honest, I don't know the data that well, so if any features makes sense to add?

Is it to compile a local instance of the FIA database?

could maybe add that - but for that use case seems easiest way is to just dump the entire thing into sqlite/or similar then query with dplyr, so not necessarily something to include here

sckott commented 8 years ago

oh, and the package name is just to have something unique and fun - and makes it easier to find on the web compared to "FIA"

taddallas commented 8 years ago

I like the idea of dumping the whole thing into sqlite. I also would like to see a big lookup table of variable names/units, and potentially some geospatial mix-ins. I haven't hardly looked at the data, but they seem like a great resource for population and community studies at larger spatial scales. The ability to aggregate counts based on a gridded mask would be cool, but maybe that's just my use case.

sckott commented 8 years ago

for sql, we can do something like in https://github.com/ropenscilabs/taxizedb where we provide light wrappers around dplyr stuff

I also would like to see a big lookup table of variable names/units

https://github.com/ropenscilabs/laselva/blob/master/R/fia_datasets.R gets datasets - any idea where to get variables?

and potentially some geospatial mix-ins.

examples?

The ability to aggregate counts based on a gridded mask would be cool, but maybe that's just my use case.

hmmm, perhaps

taddallas commented 8 years ago

Using the taxizedb approach sounds promising. I don't yet have a good grasp on how large the resulting database would be (e.g., 30+ csv files per state).

The only place I've seen variable definitions currently is in the metadata pdfs for phase 2 (500+ pages) and phase 3 (150+ pages), but hopefully they are somewhere else.

The ability to quickly visualize occurrence points as SpatialPoints or SpatialPolygons objects would be nice, but perhaps not strictly necessary.

robkooper commented 8 years ago

I have loaded it into a postgresql database. The final dump is around 10GB of sql statement, or 3GB of compressed file. The hardest part is always taking the AccessDB database description they have and converting it to PostgresSQL, followed by the very slow download.

This looks very cool. Maybe we can (if allowed) mirror the data in other places to speed up the process of downloading it. Maybe as a sql-lite database.

rykelly commented 8 years ago

Last year I made some updates to @robkooper's code and wrote up a bit of a tutorial to go with it. It's here. I kind of threw it on GH as an afterthought so it's not necessarily polished but I think the tutorial is pretty complete. It walks through the whole process of converting FIA to psql, including the pain in the ass MS Access step Rob mentioned. You're welcome to use any of that code (long as he's ok with it).

That said, since there is a bit of a process and a large quantity of data / slow download involved, I think it's a great idea to have functions (like in this package) for grabbing specific bits direct from FIA.

It's also a great idea to have some built-in converters from raw FIA to a spatial format (like sp::SpatialPointsDataFrame). From there a lot of GIS stuff is simple (e.g. raster can automatically aggregate counts like @taddallas mentioned). Of course people can do this on their own but sp has a steep learning curve so it would be helpful to have some basics wrapped in easy-to-remember laselva functions.

colinaverill commented 8 years ago

As a data user, rob's PostgreSQL product has been great, as it's easy enough to write psql in R. Biggest hurdle is getting the updates in real time. Anything that can dump that into a psql or R format in real time (or within a few weeks) of when they update would be incredibly valuable. Also if it accessed phase 2 and phase 3 data.

In terms of variable lookup I don't know if you can fix it. Even if I had the names, they are so vague. At that point you just have to hit the documentation hard.

Glad someone is thinking about this! I love the FIA!

On Thursday, August 11, 2016, Rob Kooper notifications@github.com wrote:

I have loaded it into a postgresql database. The final dump is around 10GB of sql statement, or 3GB of compressed file. The hardest part is always taking the AccessDB database description they have and converting it to PostgresSQL, followed by the very slow download.

This looks very cool. Maybe we can (if allowed) mirror the data in other places to speed up the process of downloading it. Maybe as a sql-lite database.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropenscilabs/laselva/issues/3#issuecomment-239354365, or mute the thread https://github.com/notifications/unsubscribe-auth/AH6OApUQWgClN5DT5HhKDpgjCxqK459Kks5qe-t4gaJpZM4JioMT .

sckott commented 8 years ago

Maybe we can (if allowed) mirror the data in other places to speed up the process of downloading it. Maybe as a sql-lite database.

could run a script on Heroku however often FIA udpates, then dump the output e.g., sqlite db onto amazon s3. not sure if heroku would be possible, not sure how much disk space allowed for free.

@rykelly thanks for sharing the script. Looking at the scripts, looks like it's not a completely automated process, or is it?

It's also a great idea to have some built-in converters from raw FIA to a spatial format

yeah, easy enough

@colinaverill

Also if it accessed phase 2 and phase 3 data.

where is that data?

Glad someone is thinking about this! I love the FIA!

thanks for the feedback!

rykelly commented 8 years ago

Nope, not completely automated. Far as I could tell, until FIA supplies the schema in a more sensible way it's not going to be, either.

colinaverill commented 8 years ago

Phase 3 should be available on the DataMart. Documentation link here http://www.fia.fs.fed.us/library/database-documentation/current/ver60/FIADB%20User%20Guide%20P3_6-0-1_final.pdf.

On Fri, Aug 12, 2016 at 8:12 PM, Ryan Kelly notifications@github.com wrote:

Nope, not completely automated. Far as I could tell, until FIA supplies the schema in a more sensible way it's not going to be, either.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropenscilabs/laselva/issues/3#issuecomment-239588071, or mute the thread https://github.com/notifications/unsubscribe-auth/AH6OArziQW1OhfXpXuRDB7jPEKG2FJeSks5qfQvWgaJpZM4JioMT .

sckott commented 8 years ago

@rykelly

Nope, not completely automated. Far as I could tell, until FIA supplies the schema in a more sensible way it's not going to be, either.

Okay, i guess doable, but not easily doable to provide SQL version of the FIA

sckott commented 8 years ago

@colinaverill thanks very much!

dlebauer commented 8 years ago

If it's useful, here is an SQLite dump of the FIA5 database described by @robkooper (3GB) http://file-server.igb.illinois.edu/~dlebauer/fia/

colinaverill commented 8 years ago

Any word on how difficult it would be to get a SQLite dump of the newest version of the FIA (FIA6)?

On Thu, Sep 1, 2016 at 1:28 PM, David LeBauer notifications@github.com wrote:

If it's useful, here is an SQLite dump of the FIA5 database described by @robkooper https://github.com/robkooper (3GB) http://file-server.igb. illinois.edu/~dlebauer/fia/

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropenscilabs/laselva/issues/3#issuecomment-244151217, or mute the thread https://github.com/notifications/unsubscribe-auth/AH6OAi9sn8R9D5Q9IJs3sLFlsavudiCaks5qlwslgaJpZM4JioMT .

dlebauer commented 8 years ago

Easy once Rob creates the database

rykelly commented 8 years ago

Creating the db isn't too hard either if you follow my tutorial, in case Rob has other things going on...

On Sep 1, 2016, at 1:33 PM, David LeBauer notifications@github.com wrote:

Easy once Rob creates the database

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropenscilabs/laselva/issues/3#issuecomment-244152739, or mute the thread https://github.com/notifications/unsubscribe-auth/AGgSDhjC2oMPNHL_0JfNSY7jAEHM2u4oks5qlwxdgaJpZM4JioMT.

sckott commented 8 years ago

any clue how often that dump is updated? still looking for a way to automate this

dlebauer commented 8 years ago

Start with @rykelly s documentation from https://github.com/rykelly/fia_psql

I think the instructions there are pretty good; I've attached it Converting FIA to PSQL v1.0.docx

sckott commented 8 years ago

right, i've seen and tried that, I guess this is in our way https://github.com/ropenscilabs/laselva/issues/3#issuecomment-239588071

dlebauer commented 8 years ago

@rykelly which step can't be automated?

robkooper commented 8 years ago

@sckott working on FIA6. Once I have that and maybe the code from @dlebauer I can automate it to do it say once a month or so.

sckott commented 8 years ago

@dlebauer

@rykelly said in that comment I linked to

Far as I could tell, until FIA supplies the schema in a more sensible way it's not going to be, either.

but if @robkooper can do it as said in above comment, awesome

dlebauer commented 8 years ago

@sckott I can't tell from that comment what can't be automated, at least once you have the schema, unless the schema changes. But it also seems like the schema can be dumped from access to psql.

@robkooper if by code you mean psql to sqlite, I used Navicat (thanks to free license for open source projects!!). I can automate Navicat but Google says non proprietary scripts exist.

rykelly commented 8 years ago

IIRC the only part that we couldn't automate was the Access --> psql schema. It requires a Windows-specific tool + some editing by hand. The rest should be smooth sailing though, so you're good in between (rare, I think) schema updates.

[ mobile ]

On Sep 1, 2016, 9:05 PM, at 9:05 PM, David LeBauer notifications@github.com wrote:

@sckott I can't tell from that comment what can't be automated, at least once you have the schema, unless the schema changes. But it also seems like the schema can be dumped from access to psql.

@robkooper if by code you mean psql to sqlite, I used Navicat (thanks to free license for open source projects!!). This can be automated. Google says scripts exist.

You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/ropenscilabs/laselva/issues/3#issuecomment-244256775

sckott commented 7 years ago

@robkooper any update on automating the FIA database stuff?