tompollard / tableone

Create "Table 1" for research papers in Python
https://pypi.python.org/pypi/tableone/
MIT License
159 stars 38 forks source link

Histograms in Table 1 #132

Open tompollard opened 1 year ago

tompollard commented 1 year ago

Not sure how easy this would be to implement, but making a note of this idea: https://twitter.com/DocEd/status/1572603047590916097

why don’t we just have a little histogram for each variable in table 1, in additional to the standard descriptors?

jraffa commented 1 year ago

You mean like this?

image

tompollard commented 1 year ago

Yes! I assume this is https://cran.r-project.org/web/packages/skimr/vignettes/skimr.html?

Looks like it might be possible to do this with Pandas DataFrames, e.g.:

jraffa commented 1 year ago

I think the histograms are represented as text: ▇▅▃▅▇

Seems to be unicode characters: https://www.i2symbol.com/symbols/blocks/x2587-lower-seven-eighths-block-symbol-blocks-symbol-smiley-face

tompollard commented 1 year ago

Yeah, it looks like these would require each column to be mapped to an 1/8th: https://www.compart.com/en/unicode/block/U+2580. Seems doable...

MartinBernstorff commented 1 year ago

Just want to let you guys know that skimpy has this implemented, so it should be real easy to learn from 👍

https://github.com/aeturrell/skimpy/blob/9a96f252fca01780425bbbc6d6c62165ce9ccf20/src/skimpy/__init__.py#L273

tompollard commented 1 year ago

Very cool, thanks Martin, we'll check this out! I haven't come across Skimpy before.

MartinBernstorff commented 1 year ago

Glad to hear it! I know this isn't on topic, but just want to say I spent 1,5 days implementing something like tableone, found your solution, and deleted all my code. Super happy to find a good, mature solution 👍

tompollard commented 1 year ago

@MartinBernstorff thanks for your kind words and I'm glad to hear it has been helpful! We've neglected the package recently so I'm hoping to spend a little time soon making some much needed updates, fixes, etc.

tompollard commented 1 year ago

I'm not clear what to do in the case where a groupby variable is specified. Do we: (1) plot a distribution for each separate group, i.e. if there are 4 groups, add 4 columns each with a separate histogram; or (2) plot a single histogram of the overall distribution of the variable? Any preference @DocEd?

DocEd commented 1 year ago

Tom, i'm delighted that you saw this and picked it up! I imagine an extra histogram column per group, and if there is space, an overall histogram column at the end. a little like histospark works in R (i think a hadley wickham package).

Or if space was at a premium, the summary numbers could be given as normal, and a soft background image of the histogram behind the text within the table cell could be used.

DocEd commented 1 year ago

just seen the rest of this thread. yep, exactly like above.