jakegross808 / pacn-veg-package

all pacn veg code
Creative Commons Zero v1.0 Universal
1 stars 1 forks source link

Settle on data table structure #7

Closed wright13 closed 2 years ago

wright13 commented 3 years ago

How do we want the data to be formatted for use in this package? E.g. columns present (should each data table have Unit_Code, other columns from Visit?), column name format (I think this is already pretty consistent). Another way to think of this is how should a csv export of the raw data look?

@wright13 will work on getting more familiar with the data @jakegross808 will let Sarah know which views/queries (view = saved SQL query, basically) to look at

jakegross808 commented 3 years ago

uploaded old .R script for connecting to SQL database (connection path removed) incase script or table structure is helpful. old_SQL_FTPC_database_script.r

wright13 commented 3 years ago

That script is helpful! So looking at the data export you sent me, my understanding is that...

Correct me if I'm wrong on any of that. Are there any other exports in the old script that we want to include in the package? Table structure wouldn't necessarily be identical, just thinking about compiling a list of data tables that we want to have.

wright13 commented 3 years ago

Gathering my thoughts ahead of our call - no need to respond to this before we chat :)

wright13 commented 3 years ago

To do:

jakegross808 commented 3 years ago

FTPC database documentation with relationship diagram

focal_terr_plants_db_documentation_20110131.pdf

jakegross808 commented 3 years ago

Fig.1 in that doc shows the relationship. These are the Tables needed:

tbl_Lg_Woody_Individual tbl_Tree_Canopy_Height tbl_Presence tbl_Sm_Woody_Tally tbl_Understory_Cover tbl_Woody_Debris

Would it be easiest to join/relate all associated tables and then pare down columns?

wright13 commented 3 years ago

I think I would pare down the columns first, just to reduce the amount of data that we have to retrieve from the database. You'll probably have to keep the ID columns until after joining though. Either way will work fine though!

jakegross808 commented 3 years ago

ok I'm going through each table in relationship diagram and using select() to choose columns like this: tbl_Lg_Woody_Individual <- tbl(DB, "tbl_Lg_Woody_Individual") %>% select(Large_Woody_ID, Event_ID, Species_ID, Life_Form, Quad, Status, Height, Height_Dead, Boles, DBH, DBH_Basal, Vigor, Fruit_Flower, Rooting, Foliar, Caudex_Length, Shrublike_Growth, Resprouts, Measurement_Type)

jakegross808 commented 3 years ago

Or is it better to just show the columns being dropped maybe??

tbl_Lg_Woody_Individual <- tbl(DB, "tbl_Lg_Woody_Individual") %>% select(-Sort_Order, -SSMA_TimeStamp)

jakegross808 commented 3 years ago

I guess that could screw things up though if a column we don't want suddenly got added to table.

jakegross808 commented 3 years ago

added "table_structure.R" to repository. It contains select() columns for each data table (and sub-data table, if needed)

Still need to do tbl_Events, tbl_Plots, tbl_Locations, tbl_Site, tlu_Species, and xref_Park_Species_Nativity.