jakegross808 / pacn-veg-package

all pacn veg code
Creative Commons Zero v1.0 Universal
1 stars 1 forks source link

Figure out how to handle moderately large data #15

Closed wright13 closed 3 years ago

wright13 commented 3 years ago

Species coverage is a large enough dataset that reading it into memory from the db is fairly time consuming. Sarah will come up with some options to run by Jake

wright13 commented 3 years ago

I figured out what I was doing wrong that prevented ReadFTPC from returning database table connections (i.e. without calling dplyr::collect() first)! Which means that we should be able to just do our filtering and summaries on the database side for the large data tables without making any big changes to the existing code. Certain things will be unavoidably slow, like exporting the full dataset to CSV, but it looks to me like most of the summary statistics listed in the background doc can be done using dbplyr.