joweich / chat-miner

Parsers and visualizations for chats
MIT License
567 stars 56 forks source link

ValueError: not enough values to unpack (expected 2, got 1) #92

Closed ahmedbatty closed 1 year ago

ahmedbatty commented 1 year ago

Seeing the following error while using the WhatsApp parser:

22.04.2023 12:04:32 INFO     
            Depending on the platform, the message format in chat logs might not be
            standardized accross devices/versions/localization and might change over
            time. Please report issues including your message format via GitHub.

22.04.2023 12:04:32 INFO     Initialized parser.
22.04.2023 12:04:32 INFO     Starting reading raw messages...
22.04.2023 12:04:33 INFO     Inferred date format: month/day/year
22.04.2023 12:04:33 INFO     Finished reading 39999 raw messages.
22.04.2023 12:04:33 INFO     Starting parsing raw messages...
  0%|                                                                                        | 1/39999 [00:00<?, ?it/s]

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[6], line 4
      1 from chatminer.chatparsers import WhatsAppParser
      3 parser = WhatsAppParser("WhatsApp-Chat-Sarmad.txt")
----> 4 parser.parse_file()
      5 df = parser.parsed_messages.get_df()

File E:\Projects\whatsapp-chat-miner\whatsapp-analysis\lib\site-packages\chatminer\chatparsers.py:74, in Parser.parse_file(self)
     71 self._logger.info("Finished reading %i raw messages.", len(self._raw_messages))
     73 self._logger.info("Starting parsing raw messages...")
---> 74 self._parse_raw_messages()
     75 self._logger.info("Finished parsing raw messages.")

File E:\Projects\whatsapp-chat-miner\whatsapp-analysis\lib\site-packages\chatminer\chatparsers.py:84, in Parser._parse_raw_messages(self)
     82 with logging_redirect_tqdm():
     83     for raw_mess in tqdm(self._raw_messages):
---> 84         parsed_mess = self._parse_message(raw_mess)
     85         if parsed_mess:
     86             self.parsed_messages.append(parsed_mess)

File E:\Projects\whatsapp-chat-miner\whatsapp-analysis\lib\site-packages\chatminer\chatparsers.py:168, in WhatsAppParser._parse_message(self, mess)
    163 time = datetimeparser.parse(
    164     datestr, dayfirst=self._datefmt.is_dayfirst, fuzzy=True
    165 )
    167 if ":" in author_and_body:
--> 168     author, body = [x.strip() for x in author_and_body.split(": ", 1)]
    169 else:
    170     author = "System"

ValueError: not enough values to unpack (expected 2, got 1)

Might be because of a format that is not being covered.

joweich commented 1 year ago

Hi @ahmedbatty, many thanks for reporting this! Would you mind sharing the format of the messages in your logfile?

ahmedbatty commented 1 year ago

Hi @joweich see the following message format:

11/16/22, 12:18 AM - Ahmed: <message>
11/16/22, 12:18 AM - Ahmed: <Media omitted>
11/16/22, 12:19 AM - Ahmed: <message>
joweich commented 1 year ago

@ahmedbatty this format is covered in our test cases and I can't reproduce the issue. For me, the three example messages are parsed perfectly fine. Are you running the latest version of chatminer (0.3.0)? You can confirm via

import chatminer
print(chatminer.__version__)

If you are already running 0.3.0, there is some formatting in your chatlog that we don't yet catch. I would then need your support to identify the lines that cause the issue.

ahmedbatty commented 1 year ago

@joweich Running the latest version: Screenshot 2023-04-26 235525

Let me know how I can help you out.

joweich commented 1 year ago

@ahmedbatty I temporally added a debugging output in #93. This should help us identifying the lines that break the parser. Please use the code in this PR and try to parse your logfile. The console output will show what we don't yet catch. Thank you!

ahmedbatty commented 1 year ago

@joweich I used the code from https://github.com/joweich/chat-miner/pull/93 and was able to parse my chat log successfully. See the following output:

27.04.2023 23:16:13 INFO     
            Depending on the platform, the message format in chat logs might not be
            standardized accross devices/versions/localization and might change over
            time. Please report issues including your message format via GitHub.

27.04.2023 23:16:13 INFO     Initialized parser.
27.04.2023 23:16:13 INFO     Starting reading raw messages...
27.04.2023 23:16:13 INFO     Inferred date format: month/day/year
27.04.2023 23:16:13 INFO     Finished reading 39999 raw messages.
27.04.2023 23:16:13 INFO     Starting parsing raw messages...
27.04.2023 23:16:13 WARNING  Failed to parse message: 4/22/23, 11:15 AM - Ahmed:. Skipped.                      
100%|█████████████████████████████████████████████████████████████████████████| 39999/39999 [00:02<00:00, 14467.43it/s]
27.04.2023 23:16:16 INFO     Finished parsing raw messages.

So I tracked down the message that failed to parse and found out:

joweich commented 1 year ago

Thanks for drilling this down! I will provide a fix for this 👍🏼