nschloe / termplotlib

:chart_with_upwards_trend: Plotting on the command line
GNU General Public License v3.0
675 stars 19 forks source link

Handle strings in horizontal histogram bins #39

Closed thenger closed 4 years ago

thenger commented 4 years ago

Hello, thank you for making this.

Long story:

In my use case I had a Pandas dataframe (DF) containing 1 panda series. This series had been created by resample a two series DF with dates and occurrences

The resampling is like an histogram binning: it add all occurrences by frequency. e.g (series with dates as index):

12:00 -> 17
12:30 -> 21
13:00 -> 11

(^this is greatly simplified)

TL;DR

I want to have something like this:

2020-04-11 23:00:00  [  28]  ▎
2020-04-11 23:30:00  [  29]  ▎
2020-04-12 00:00:00  [1299]  █████████████▌
2020-04-12 00:30:00  [2637]  ███████████████████████████▍
2020-04-12 01:00:00  [ 996]  ██████████▍
2020-04-12 01:30:00  [ 404]  ████▎
2020-04-12 02:00:00  [ 557]  █████▊

or something like that And maybe be able to format dates

What I changed

In my code I needed to change this line https://github.com/nschloe/termplotlib/blob/9827634d1a7049ca430506532bb048241788ad38/termplotlib/hist.py#L48

to this:

    if show_bin_edges:
        if type(bin_edges[0]) != int:
            labels = [str(d) for d in bin_edges]
        else:
            labels = [
                "{:+.2e} - {:+.2e}".format(bin_edges[k], bin_edges[k + 1])
                for k in range(len(bin_edges) - 1)
            ]

(I find it more readable to have just one side of the interval btw)

Advantage:

I could provide termplotlib.figure.hist() these equal length "lists":

counts = DF.series # ≡ list of occurences (or int list)
bin_edges = DF.series.index # ≡ list of dates

Feature request

Handle any types of bins dates (e.g. python datetimes; numpy datetimes64, ...) and their formats ?

As this might be too painful I think the str() could do most of it.

Thank you for reading Regards

nschloe commented 4 years ago

You can already do that, check the main readme.

thenger commented 4 years ago

ah my bad /:

So I also tried with barh() and got some errors due to the if label: here https://github.com/nschloe/termplotlib/blob/9827634d1a7049ca430506532bb048241788ad38/termplotlib/barh.py#L24 and here https://github.com/nschloe/termplotlib/blob/9827634d1a7049ca430506532bb048241788ad38/termplotlib/barh.py#L42

Error:

  File "/home/beeep/.virtualenvs/py3env/lib/python3.7/site-packages/termplotlib/figure.py", line 60, in barh
    self._content.append(barh(*args, **kwargs))
  File "/home/beeep/.virtualenvs/py3env/lib/python3.7/site-packages/termplotlib/barh.py", line 24, in barh
    if labels:
  File "/home/beeep/.virtualenvs/py3env/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2150, in __nonzero__
    f"The truth value of a {type(self).__name__} is ambiguous. "
ValueError: The truth value of a DatetimeIndex is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I changed to if labels is not None: IMO you should almost never use if vars: as it can be true for large number of stuffs


Unfortunately when using DateTimeIndex the result is not what someone would expect :

19s  [ 159]  █▋
19s  [1722]  █████████████████▉
19s  [3852]  ████████████████████████████████████████
19s  [1144]  ███████████▉
19s  [ 424]  ████▍
19s  [ 523]  █████▍
19s  [  96]  █

I think you forgot to call str() on line https://github.com/nschloe/termplotlib/blob/9827634d1a7049ca430506532bb048241788ad38/termplotlib/barh.py#L43 as you did in https://github.com/nschloe/termplotlib/blob/9827634d1a7049ca430506532bb048241788ad38/termplotlib/barh.py#L25 to create the formater string

What I also changed:

        if labels is not None:
            data.append(str(labels[k]))

hth

nschloe commented 4 years ago

All good suggestions. Instead of pasting the code here, it'd make it much easier for me if you submitted a PR.

thenger commented 4 years ago

will do