ropensci / popler

The R package to browse and query the popler database
https://docs.ropensci.org/popler
MIT License
8 stars 7 forks source link

Change default query in `get_data` #33

Open AldoCompagnoni opened 7 years ago

AldoCompagnoni commented 7 years ago

The default query of get_data sometimes returns sparse data frames. I suggest that, as default, we should:

  1. Always include sppcode (until at least we translate sppcode to genus/species)
  2. Always include lat_study_site/lng_study_site
  3. Convert -99999 values to NA
  4. Do not return treatment/structure/spatial replication columns if they contain only NAs
  5. covariates in last instead of penultimate column?
  6. columns with spat_rep LABEL in the downloaded data set (this idea came up over a month ago, but I never implemented it).
AldoCompagnoni commented 7 years ago

Progress made so far:

  1. Included sppcode, lat_study_site, and lng_study_site in default queries.
  2. Function substitutes -99999 with NAs - but only in numeric columns
  3. Function removes columns that contain only "NA"
  4. Output of get data now includes the label of spatial replicates (e.g. spatial_replication_level_1_label)
AldoCompagnoni commented 7 years ago

New tasks before we close this issues:

  1. If substitution of -99999 with NAs as fast as it could be? My concern is that the code I use works with only a subset of the output data frame (e.g., the code is: output_data[,col_repl] <- as.data.frame(lapply(output_data[,col_repl], function(x){replace(x, x == -99999,NA)}))
  2. Find a new column name for the label of spatial replicates. I fear that labels such as spatial_replication_level_1_label are will look annoyingly long for the average user.

Moreover, I have moved author and authors_contact as first two lines of the data frames returned by get_data. The rationale is that in so doing, author information is prominent, but it's not "in the way" of the actual data.