okiyuki99 / HowToR

R Tips
3 stars 0 forks source link

data.frame summary / visualization #1

Open okiyuki99 opened 6 years ago

okiyuki99 commented 6 years ago

summarytools

skimr

> skimr::skim(mtcars)
Skim summary statistics
 n obs: 32 
 n variables: 11 

─ Variable type:numeric ────────────────────────────────────
 variable missing complete  n   mean     sd    p0    p25    p50    p75   p100     hist
       am       0       32 32   0.41   0.5   0      0      0      1      1    ▇▁▁▁▁▁▁▆
     carb       0       32 32   2.81   1.62  1      2      2      4      8    ▆▇▂▇▁▁▁▁
      cyl       0       32 32   6.19   1.79  4      4      6      8      8    ▆▁▁▃▁▁▁▇
     disp       0       32 32 230.72 123.94 71.1  120.83 196.3  326    472    ▇▆▁▂▅▃▁▂
     drat       0       32 32   3.6    0.53  2.76   3.08   3.7    3.92   4.93 ▃▇▁▅▇▂▁▁
     gear       0       32 32   3.69   0.74  3      3      4      4      5    ▇▁▁▆▁▁▁▂
       hp       0       32 32 146.69  68.56 52     96.5  123    180    335    ▃▇▃▅▂▃▁▁
      mpg       0       32 32  20.09   6.03 10.4   15.43  19.2   22.8   33.9  ▃▇▇▇▃▂▂▂
     qsec       0       32 32  17.85   1.79 14.5   16.89  17.71  18.9   22.9  ▃▂▇▆▃▃▁▁
       vs       0       32 32   0.44   0.5   0      0      0      1      1    ▇▁▁▁▁▁▁▆
       wt       0       32 32   3.22   0.98  1.51   2.58   3.33   3.61   5.42 ▃▃▃▇▆▁▁▂
funs <- list(p1 = purrr::partial(quantile, probs = 0.01), p99 = purrr::partial(quantile, probs = 0.99))
skim_with(numeric = funs, append = T)
skimr::skim(feature)

skim_to_wide

knitr::kable(skimr::skim_to_wide(iris))

janitor

Tabplot

DataExplorer

SmartEDA

okiyuki99 commented 6 years ago

If Python, pandas_profiling is similar library. http://nbviewer.jupyter.org/github/JosPolfliet/pandas-profiling/blob/master/examples/meteorites.ipynb

okiyuki99 commented 6 years ago

The following command lead the following view.

summarytools::view(summarytools::dfSummary(mtcars))

image

#### Rmd で埋め込み (r-wakalang kazutan さんより) ``` --- output: html_document --- library(summarytools) xxx <- view(dfSummary(iris), method = "render") xxx ```
okiyuki99 commented 6 years ago

When we select a checking variable 、janitor::tabyl is covenient

> janitor::tabyl(mtcars$cyl)
 mtcars$cyl  n percent
          4 11 0.34375
          6  7 0.21875
          8 14 0.43750
okiyuki99 commented 6 years ago
okiyuki99 commented 6 years ago

https://twitter.com/iiiaui/status/1036984893413052418

最近、慣れないデータをもらったら(というか転職したばっかで慣れてないものばかりだけど)、summarytools::dfSummary() %>% summarytools::view()からの、気になる変数 %>% janitor::tabyl()で、データ理解を深め、rmarkdownでどんどんメモ→knit→予備調査レポートできあがり、が効率いい
okiyuki99 commented 5 years ago