joweich / chat-miner

Parsers and visualizations for chats
MIT License
567 stars 56 forks source link

`calendar_heatmap()` has a ValueError #48

Closed victormihalache closed 1 year ago

victormihalache commented 1 year ago

I have exported my Signal chat according to the README and moved the index.md file generated by the export to the folder with the Python script.

This is the folder structure:

.
├── index.md
└── main.py

0 directories, 2 files

And this is the main.py

from chatminer.chatparsers import SignalParser

import chatminer.visualizations as vis
import matplotlib.pyplot as plt

parser = SignalParser("./index.md")
parser.parse_file_into_df()

fig, ax = plt.subplots(2, 1, figsize=(9, 3))
ax[0] = vis.calendar_heatmap(parser.df, year=2020, cmap='Oranges', ax=ax[0])
ax[1] = vis.calendar_heatmap(parser.df, year=2021, linewidth=0, monthly_border=True, ax=ax[1])

But I get this when running the script:

13.12.2022 10:10:18 INFO     
            Depending on the platform, the message format in chat logs might not be
            standardized accross devices/versions/localization and might change over
            time. Please report issues including your message format via GitHub.

13.12.2022 10:10:18 INFO     Initialized parser.
13.12.2022 10:10:18 INFO     Starting reading raw messages into memory...
13.12.2022 10:10:18 INFO     Finished reading 99018 raw messages into memory.
13.12.2022 10:10:18 INFO     Starting parsing raw messages into dataframe...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 99018/99018 [00:03<00:00, 31160.51it/s]
13.12.2022 10:10:22 INFO     Finished parsing raw messages into dataframe.
Traceback (most recent call last):
  File "/Users/victormihalache/Desktop/chatdata/main.py", line 11, in <module>
    ax[0] = vis.calendar_heatmap(parser.df, year=2020, cmap='Oranges', ax=ax[0])
  File "/opt/homebrew/lib/python3.10/site-packages/chatminer/visualizations.py", line 175, in calendar_heatmap
    pc, cax=cax, ticks=[min(vmin), int((min(vmin) + max(vmax)) / 2), max(vmax)]
ValueError: cannot convert float NaN to integer

Example of the index.md file:

[2022-12-12 10:48] Person2:
>
> The replied-to message
>
The message
[2022-12-12 10:49] Person2: Another message
[2022-12-12 10:49] Person2: And another one
[2022-12-12 10:49] Me: The answer
[2022-12-12 10:50] Person2: A response to the answer

(I have only changed the name of "Person2", not "Me", and the contents of the messages, but the format and dates is the same as the original)

I have no clue why it is not working.

EDIT:

I just noticed that sometimes we sent code snippets written in python, and VSCode interprets a python comment as a header, here is an example of me replying to a snippet (notice how the > is not present on every line)

[2022-02-17 10:59] Me:
>
> ```
# the comment
while True:
  print("Some python code")
\`\`\`
>
my eyes, my poor eyes

(the [\`\`\`] is just so formatting is not broken here, but is [```] in the original message)

EDIT 2:

However, doing print(parser.df) doesn't seem to show any anomaly

                 datetime author                                            message  weekday  hour  words  letters
0     2022-12-13 09:17:00    Pr2                                                msg  Tuesday     9      1        5
1     2022-12-13 09:17:00     Me  A seemingly long message so you see the dots h...  Tuesday     9      9      113
2     2022-12-13 09:17:00    Pr2                                            another  Tuesday     9      1        4
3     2022-12-13 09:16:00     Me                                         a response  Tuesday     9      4       25
4     2022-12-13 09:16:00    Pr2                              The words and letters  Tuesday     9      5       22
...                   ...    ...                                                ...      ...   ...    ...      ...
99013 2021-11-14 17:58:00    Pr2                                                ...   Sunday    17      1        3
99014 2021-11-14 17:58:00    Pr2            Are wrong because i changed the message   Sunday    17      1        4
99015 2021-11-14 17:58:00    Pr2                                                ...   Sunday    17      1        3
99016 2021-11-14 17:58:00     Me                                              hello   Sunday    17      1        6
99017 2021-11-14 13:05:00    Pr2                                                      Sunday    13      1        0

[99018 rows x 7 columns]

EDIT 3:

The sunburnst works fine:

from chatminer.chatparsers import SignalParser

import chatminer.visualizations as vis
import matplotlib.pyplot as plt

parser = SignalParser("./index.md")
parser.parse_file_into_df()

fig, ax = plt.subplots(1, 2, figsize=(
    7, 3), subplot_kw={'projection': 'polar'})
ax[0] = vis.sunburst(parser.df, highlight_max=True, isolines=[
                     2500, 5000], isolines_relative=False, ax=ax[0])
ax[1] = vis.sunburst(parser.df, highlight_max=False,
                     isolines=[0.5, 1], color='C1', ax=ax[1])

plt.show()
Screenshot 2022-12-13 at 10 36 54 AM
victormihalache commented 1 year ago

Tried using the module as a local module, went back to using the one from pip, and now works.