spgarbet / tangram

Table Grammar package for R
66 stars 3 forks source link

Thanks and request for descriptive table (a.k.a. Table 1) #36

Closed sbalci closed 6 years ago

sbalci commented 6 years ago

Thank you very much for this invaluable package.

I am not sure if this is answered, sorry in advance for duplication.

Would you please guide me if there is a way to produce a table with just descriptive stats, not as a cross table and without statistics?

An example table:

ekran resmi 2018-06-13 22 31 30
spgarbet commented 6 years ago

The basic Hmisc default transforms gets close to this.

> library(tangram)
> tangram(1 ~ sex+drug+bili, pbc, test=FALSE)
==============================================
                          N         All       
                                    418       
----------------------------------------------
sex : female             418   0.895  374/418 
drug                     418                  
   D-penicillamine             0.368  154/418 
   placebo                     0.378  158/418 
   not randomized              0.254  106/418 
Serum Bilirubin (mg/dl)  418  0.80 *1.40* 3.40
==============================================

One can then output the table to a variety of formats. To do exactly like the table above would be difficult, as it's stylistically inconsistent, but it is possible. For 2 level factors gender above shows ratio, and for the others it shows count for both. Also the Follow up has it by counts of observations. There are multiple possible solutions, one use the descriptive from above. Or create your own transform to do other types of statistics that you desire. I would encourage making sure that stylistically coming up with treatments based on the data type and applying these in the same manner for all variables. If you'd like I can make a pass at closely duplicating the statistics with a transform for the above table, but make a few minor tweaks for consistency of presentation.

sbalci commented 6 years ago

Thank you very much for your quick reply. Your solution is very much convenient 👍 :)

As far as I can get in formulation one has to use 1 instead of y.

Writing percentages for categorical values may be optimised: %25.4 instead of 0.254

Best wishes

spgarbet commented 6 years ago

What format are you rendering to? I can give a quick example that shows this with the command line summary. A similar thing can be done for HTML5, LaTeX or Rmd.

> library(tangram)
Loading required package: R6
Loading required package: magrittr
> tangram(1 ~ sex+drug+bili, pbc, test=FALSE)
==============================================
                          N         All       
                                    418       
----------------------------------------------
sex : female             418   0.895  374/418 
drug                     418                  
   D-penicillamine             0.368  154/418 
   placebo                     0.378  158/418 
   not randomized              0.254  106/418 
Serum Bilirubin (mg/dl)  418  0.80 *1.40* 3.40
==============================================
> summary.cell_fraction <- function(object, ...) { paste0('%', object['percentage']) }
> assignInNamespace("summary.cell_fraction", summary.cell_fraction, "tangram")
> tangram(1 ~ sex[1]+drug[1]+bili, pbc, test=FALSE)
==============================================
                          N         All       
                                    418       
----------------------------------------------
sex : female             418       %89.5      
drug                     418                  
   D-penicillamine                 %36.8      
   placebo                         %37.8      
   not randomized                  %25.4      
Serum Bilirubin (mg/dl)  418  0.80 *1.40* 3.40
==============================================
spgarbet commented 6 years ago

FWIW: We've had several debates between various clinical reporting teams about whether percentage or ratio is preferred in these reports. It's settled on ratio being the recommendation for our clinical reports. The design of the library is such that these choices are overridable because other institutions might not agree with our choices.

spgarbet commented 6 years ago

Has this given you what you need?

sbalci commented 6 years ago

Thank you very much. I replied late because of hardware issues. In my field we prefer 'n (%)' format. '374 (%89)'. I think it is doable via your example. And it solves the problem. Thank you again.

One other thing is that the results are '.' Dot seperated not tab seperated. And it is not easy to modify in excel or word. Shoul i open a new issue on that?

Best wishes.

spgarbet commented 6 years ago

No, we can deal with this here. Those results you see above are just the command line results, there are multiple choices downstream for rendering results. What is your goal format? Html, rmd, csv, latex? From dot separated, it sounds like you're targeting a csv file.

sbalci commented 6 years ago

A csv file would be good. I have many outputs with similar results and I want to combine them. I want to treat them as data.frame and combine with full_join.

Thank you

spgarbet commented 6 years ago

Tangram supports combining tables (rbind and cbind), and can deal with data.frames as input. There is a csv call.

In general however, the less translations and things going on the better. So, I assume you have data to produce a table from. You can call a summary formula over your data, and pick your transform (hmisc style default). This produces an abstract tangram table. Right now, the outputs provided are: html5, latex, rtf, and csv.

sbalci commented 6 years ago

Thank you very much. Best wishes.

spgarbet commented 6 years ago

The new version I'm preparing has overrides for everything you've requested. See the main README about half way down.

sbalci commented 6 years ago

I really liked it :) Thank you very much.