Closed elinw closed 4 years ago
@elinw thanks so much for opening this issue! I meant to reach out, but then life got crazy with work and covid homeschool, and I totally forgot. Sorry!
Your intuition is totally right. Here's a screenshot from the modelsummary
website:
Although I did not copy your code and I figured out the unicode strategy on my own, duplicating some of the skimr
functionality really came from fandom. From that perspective, I want to state at the outset that I am more than happy to make changes if you feel uncomfortable with my current approach (e.g., change the function name).
Using skimr
directly sounds like a very interesting idea! Honestly, I didn't think through many of the complications, and I don't have a Windows box to test on. Right now, I expect the spark histogram to fail when:
Would using skimr
get me any of those for "free"? Will it prevent other problems I haven't thought about?
For me, the main cost of using skimr
would be to add yet another dependency (on Github, I moved ggplot2
to suggests, and am planning to further reduce my reliance on external packages). This is mainly a problem if the skimr
API changes. How stable do you think the API is likely to be?
Aside from that, it doesn't look like skimr
would add that many upstream dependencies:
library(miniCRAN)
mdep <- c('broom', 'checkmate', 'dplyr', 'kableExtra', 'knitr', 'magrittr', 'rmarkdown', 'tables', 'tibble', 'tidyr')
mdep <- pkgDep(mdep, suggests=FALSE)
sdep <- pkgDep("skimr", suggests=FALSE)
setdiff(sdep, mdep)
#> [1] "skimr" "repr" "withr"
(Incidentally, when I grep the skimr
repo, I don't see any call to repr
, and just a single call to withr
. Does skimr
need those?)
Looooongwinded way of saying: "Yes, interested! What do you think are the benefits for different output formats?"
For reference, the internal function modelsummary::factory
accepts a dataframe or tibble, does some minor tranforms, and dispatches it to one of factory_*
functions for each of the table-making packages. The only thing that datasummary_skim
needs to do is prepare a dataframe and feed it to factory
.
FYI, I have some prototype code using kableExtra
's new inline plot functionality. Please don't invest time in a PR until I decide if I want to move away from unicode to inline svg.
In any case, I plan to (a) change the output somewhat so it differs from skim
, and (2) add an explicit mention to skimr
in the detailed description of the function so that people know the intellectual origin when they type datasummary_skim
.
I'll keep you updated on whether I move to skimr
or kableExtra
.
Thanks again,.
Okay, let me know. Doing a custom skimmer is really easy with our function factory, and now that we are at skimr 2 we don't expect that API to change at all. Yes in skimr the histograms don't currently render on Windows (we test for windows and then don't do it). Skimr 2 skim() has a pretty different API in that it returns a tibble/data frame but you can also return just the numerics or a list by class. We've found that there is a lot of complexity in doing this (many edge cases and so on) and you might want to put that on us.
Sounds great! Thanks again for your suggestions! I will definitely consider switching at some point if I run into too many data-handling issues. For now, I have:
skimr
in the docskableExtra
's spec_hist
, which gives me automatic support for Windows, but no support for markdown histograms (but support for HTML and Rmarkdown PDF+HTML).
Hi,
Congratulations on the beautiful package!
We ( @michaelquinn32 and I) noticed your datasummary_skim() function the output of which looks a lot like that for skimr::skim(). Was it inspired by our package? It's pretty complex to maintain the package because of things like Windows not supporting UTF-8 in data frames which makes the histograms impossible. It might be simpler to use skimr directly. Would you be interested in a PR to support a customized skimmer using skimr? Did you run into an issue trying to do that?