ropensci / popler

The R package to browse and query the popler database
https://docs.ropensci.org/popler
MIT License
8 stars 7 forks source link

How to load the main table? #27

Closed AldoCompagnoni closed 7 years ago

AldoCompagnoni commented 7 years ago

We need a way to load popler's main table into each computer as we install, or load, popler.
We need this because CRAN does not accept files larger than 5MB. Two potential ideas are:

bochocki commented 7 years ago

This is a duplicate issue (https://github.com/AldoCompagnoni/popler/issues/7). I suggest closing one or the other.

bochocki commented 7 years ago

We decided to add a small 'example' table with the package to run tests.

We also decided that the main table will be downloaded with the first run of library(popler) if it does not exist, and the check/create functionality will be implemented in browse's main_table() function (although that function should probably be re-named).

bochocki commented 7 years ago

Also, we should make sure that the variable main_table or whatever we call it is protected so users cannot easily overwrite it.

AldoCompagnoni commented 7 years ago

One more thought on this: we should decide whether to

  1. store main_table in the user's machine ONCE, when the package is installed, or
  2. query the database and create main_table every time you load popler.

The former is preferable, but I don't know whether or how it can be done. I SUSPECT that this might be possible by using the setHook function, but I am not sure I understand the documentation. The latter is easy, it can be implemented using .onLoad, located in file zzz.R (https://github.com/AldoCompagnoni/popler/commit/1ac795acfc6b7a5238585d80a2c43e9903c3d1db)

bochocki commented 7 years ago

Made some significant updates to main_table.

Brief summary: We will have a copy of this table in the package release of popler. Users will be able to update the table as often as they desire, and users will be prompted to update the table every 6 weeks.

Detailed summary: 1) changed the name from main_table to summary_table in browse.R and util.R. This name is more descriptive of the table, and this naming limits confusion with the actual "main table" in the database. 2) wrote a function called summary_table_check() which checks to see if the summary_table is on the user's machine. If it is not, then another function -- summary_table_update() -- will be called. summary_table_update() creates (or updates) the summary_table on the user's machine and loads the new table into popler's environment. If the user already has a copy of summary_data, but that table is more than 6 weeks old, summary_table_check() will prompt the user to manually run summary_table_update() to update the table.

  1. The summary table is currently 151kb, which is small enough to be included in the CRAN package, and I suggest we do this.

We still need to get summary_table_check() to run when popler loads, but this is more of a minor issue; I think .onLoad is the way to go, but haven't had a chance to sit down and actually figure it out.

bochocki commented 7 years ago

Also, I recommend deleting ./data_raw/generate_sysdata.R since it is now redundant with (and has mostly the same code as) summary_table_update()

bochocki commented 7 years ago

Added a zzz.R file that includes summary_table_check() in .onLoad(), so the summary table is checked (and loaded into the namespace) when popler is loaded.

Forgot to mention that the summary table can be accessed using popler::summary_table.

I deleted ./data_raw/generate_sysdata.R, since it is now redundant with summary_table_update().

If the current setup is okay, I vote to close this issue.