Closed sckott closed 4 years ago
also possibly of interest to:
@dlebauer
Thanks. Brings back fond memories! What is the scope of the package - it looks like it imports data from FIA, but its unclear why the name is 'laselva'. Is it to compile a local instance of the FIA database?
@mdietze, @robkooper, @colinaverill, @rykelly are all actively working with FIA and a few others within @PecanProject/pecan-developers : do any of you have feedback on an R package that makes using FIA easier?
I know, starting learning R in '06 working with FIA data!
What is the scope of the package
@taddallas wanted to more easily get FIA data files, - Scope I think is finding out what files available and fetching data files. Scope for making it easy to combine them as well. Perhaps doing some transformations on variables, adding new variables - To be honest, I don't know the data that well, so if any features makes sense to add?
Is it to compile a local instance of the FIA database?
could maybe add that - but for that use case seems easiest way is to just dump the entire thing into sqlite/or similar then query with dplyr, so not necessarily something to include here
oh, and the package name is just to have something unique and fun - and makes it easier to find on the web compared to "FIA"
I like the idea of dumping the whole thing into sqlite. I also would like to see a big lookup table of variable names/units, and potentially some geospatial mix-ins. I haven't hardly looked at the data, but they seem like a great resource for population and community studies at larger spatial scales. The ability to aggregate counts based on a gridded mask would be cool, but maybe that's just my use case.
for sql, we can do something like in https://github.com/ropenscilabs/taxizedb where we provide light wrappers around dplyr stuff
I also would like to see a big lookup table of variable names/units
https://github.com/ropenscilabs/laselva/blob/master/R/fia_datasets.R gets datasets - any idea where to get variables?
and potentially some geospatial mix-ins.
examples?
The ability to aggregate counts based on a gridded mask would be cool, but maybe that's just my use case.
hmmm, perhaps
Using the taxizedb
approach sounds promising. I don't yet have a good grasp on how large the resulting database would be (e.g., 30+ csv files per state).
The only place I've seen variable definitions currently is in the metadata pdfs for phase 2 (500+ pages) and phase 3 (150+ pages), but hopefully they are somewhere else.
The ability to quickly visualize occurrence points as SpatialPoints
or SpatialPolygons
objects would be nice, but perhaps not strictly necessary.
I have loaded it into a postgresql database. The final dump is around 10GB of sql statement, or 3GB of compressed file. The hardest part is always taking the AccessDB database description they have and converting it to PostgresSQL, followed by the very slow download.
This looks very cool. Maybe we can (if allowed) mirror the data in other places to speed up the process of downloading it. Maybe as a sql-lite database.
Last year I made some updates to @robkooper's code and wrote up a bit of a tutorial to go with it. It's here. I kind of threw it on GH as an afterthought so it's not necessarily polished but I think the tutorial is pretty complete. It walks through the whole process of converting FIA to psql, including the pain in the ass MS Access step Rob mentioned. You're welcome to use any of that code (long as he's ok with it).
That said, since there is a bit of a process and a large quantity of data / slow download involved, I think it's a great idea to have functions (like in this package) for grabbing specific bits direct from FIA.
It's also a great idea to have some built-in converters from raw FIA to a spatial format (like sp::SpatialPointsDataFrame
). From there a lot of GIS stuff is simple (e.g. raster
can automatically aggregate counts like @taddallas mentioned). Of course people can do this on their own but sp
has a steep learning curve so it would be helpful to have some basics wrapped in easy-to-remember laselva
functions.
As a data user, rob's PostgreSQL product has been great, as it's easy enough to write psql in R. Biggest hurdle is getting the updates in real time. Anything that can dump that into a psql or R format in real time (or within a few weeks) of when they update would be incredibly valuable. Also if it accessed phase 2 and phase 3 data.
In terms of variable lookup I don't know if you can fix it. Even if I had the names, they are so vague. At that point you just have to hit the documentation hard.
Glad someone is thinking about this! I love the FIA!
On Thursday, August 11, 2016, Rob Kooper notifications@github.com wrote:
I have loaded it into a postgresql database. The final dump is around 10GB of sql statement, or 3GB of compressed file. The hardest part is always taking the AccessDB database description they have and converting it to PostgresSQL, followed by the very slow download.
This looks very cool. Maybe we can (if allowed) mirror the data in other places to speed up the process of downloading it. Maybe as a sql-lite database.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropenscilabs/laselva/issues/3#issuecomment-239354365, or mute the thread https://github.com/notifications/unsubscribe-auth/AH6OApUQWgClN5DT5HhKDpgjCxqK459Kks5qe-t4gaJpZM4JioMT .
Maybe we can (if allowed) mirror the data in other places to speed up the process of downloading it. Maybe as a sql-lite database.
could run a script on Heroku however often FIA udpates, then dump the output e.g., sqlite db onto amazon s3. not sure if heroku would be possible, not sure how much disk space allowed for free.
@rykelly thanks for sharing the script. Looking at the scripts, looks like it's not a completely automated process, or is it?
It's also a great idea to have some built-in converters from raw FIA to a spatial format
yeah, easy enough
@colinaverill
Also if it accessed phase 2 and phase 3 data.
where is that data?
Glad someone is thinking about this! I love the FIA!
thanks for the feedback!
Nope, not completely automated. Far as I could tell, until FIA supplies the schema in a more sensible way it's not going to be, either.
Phase 3 should be available on the DataMart. Documentation link here http://www.fia.fs.fed.us/library/database-documentation/current/ver60/FIADB%20User%20Guide%20P3_6-0-1_final.pdf.
On Fri, Aug 12, 2016 at 8:12 PM, Ryan Kelly notifications@github.com wrote:
Nope, not completely automated. Far as I could tell, until FIA supplies the schema in a more sensible way it's not going to be, either.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropenscilabs/laselva/issues/3#issuecomment-239588071, or mute the thread https://github.com/notifications/unsubscribe-auth/AH6OArziQW1OhfXpXuRDB7jPEKG2FJeSks5qfQvWgaJpZM4JioMT .
@rykelly
Nope, not completely automated. Far as I could tell, until FIA supplies the schema in a more sensible way it's not going to be, either.
Okay, i guess doable, but not easily doable to provide SQL version of the FIA
@colinaverill thanks very much!
If it's useful, here is an SQLite dump of the FIA5 database described by @robkooper (3GB) http://file-server.igb.illinois.edu/~dlebauer/fia/
Any word on how difficult it would be to get a SQLite dump of the newest version of the FIA (FIA6)?
On Thu, Sep 1, 2016 at 1:28 PM, David LeBauer notifications@github.com wrote:
If it's useful, here is an SQLite dump of the FIA5 database described by @robkooper https://github.com/robkooper (3GB) http://file-server.igb. illinois.edu/~dlebauer/fia/
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropenscilabs/laselva/issues/3#issuecomment-244151217, or mute the thread https://github.com/notifications/unsubscribe-auth/AH6OAi9sn8R9D5Q9IJs3sLFlsavudiCaks5qlwslgaJpZM4JioMT .
Easy once Rob creates the database
Creating the db isn't too hard either if you follow my tutorial, in case Rob has other things going on...
On Sep 1, 2016, at 1:33 PM, David LeBauer notifications@github.com wrote:
Easy once Rob creates the database
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropenscilabs/laselva/issues/3#issuecomment-244152739, or mute the thread https://github.com/notifications/unsubscribe-auth/AGgSDhjC2oMPNHL_0JfNSY7jAEHM2u4oks5qlwxdgaJpZM4JioMT.
any clue how often that dump is updated? still looking for a way to automate this
Start with @rykelly s documentation from https://github.com/rykelly/fia_psql
I think the instructions there are pretty good; I've attached it Converting FIA to PSQL v1.0.docx
right, i've seen and tried that, I guess this is in our way https://github.com/ropenscilabs/laselva/issues/3#issuecomment-239588071
@rykelly which step can't be automated?
@sckott working on FIA6. Once I have that and maybe the code from @dlebauer I can automate it to do it say once a month or so.
@dlebauer
@rykelly said in that comment I linked to
Far as I could tell, until FIA supplies the schema in a more sensible way it's not going to be, either.
but if @robkooper can do it as said in above comment, awesome
@sckott I can't tell from that comment what can't be automated, at least once you have the schema, unless the schema changes. But it also seems like the schema can be dumped from access to psql.
@robkooper if by code you mean psql to sqlite, I used Navicat (thanks to free license for open source projects!!). I can automate Navicat but Google says non proprietary scripts exist.
IIRC the only part that we couldn't automate was the Access --> psql schema. It requires a Windows-specific tool + some editing by hand. The rest should be smooth sailing though, so you're good in between (rare, I think) schema updates.
[ mobile ]
On Sep 1, 2016, 9:05 PM, at 9:05 PM, David LeBauer notifications@github.com wrote:
@sckott I can't tell from that comment what can't be automated, at least once you have the schema, unless the schema changes. But it also seems like the schema can be dumped from access to psql.
@robkooper if by code you mean psql to sqlite, I used Navicat (thanks to free license for open source projects!!). This can be automated. Google says scripts exist.
You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/ropenscilabs/laselva/issues/3#issuecomment-244256775
@robkooper any update on automating the FIA database stuff?
@taddallas did you try it yet? let me know any thoughts