lucasrodes / whatstk

WhatsApp chats as dataframes. Python toolkit to analyse and parse WhatsApp chats.
https://whatstk.lcsrg.me/
GNU General Public License v3.0
121 stars 23 forks source link

Header automatic extraction failed: '[%m/%d/%y, %H:%M:%S] %name:' IOS #139

Closed pumalife closed 6 months ago

pumalife commented 9 months ago

Exported from WhatsApp IOS App (business and personal), American living internationally (not sure if that matters).

Manually matched pattern here: '[%m/%d/%y, %H:%M:%S] %name:'

[12/27/20, 18:51:14] My Group 2023: ‎Messages and calls are end-to-end encrypted. No one outside of this chat, not even WhatsApp, can read or listen to them. [7/18/21, 18:34:07] ~ Nev: That's awesome. Well said. [7/18/21, 20:36:20] Sandrine - French: What is this? [7/20/21, 11:21:27] ‪+1 (777) 777‑7777‬: ‎‪+1 (777) 777‑7777‬ joined using this group's invite link

JoshEe00 commented 6 months ago

I'm still facing this error. Is there an update on how to fix it?

lucasrodes commented 6 months ago

hi @JoshEe00, I'll try to address this bug these days. Can you share a slice of your chat so it helps me debug (feel free to replace the message content for whatever)?

lucasrodes commented 6 months ago

Could you, @JoshEe00 or @pumalife, share a demo text exported from your chat as a TXT file? Could you replace its messages if needed? Also, please share the code you tried to read and the error that you got. Otherwise it is hard for me to debug this bug.

I copied the chat content from @pumalife into a txt file, and then running

from whatstk import df_from_txt_whatsapp
df = df_from_txt_whatsapp("chat-139.txt", hformat="[%m/%d/%y, %H:%M:%S] %name:")

And it seems to work fine. Could you please confirm again that this is not working for you (and what error are you having)?

Incidentally, if I open the TXT file with the chat, I can read some characters I am not familiar with.

image
JoshEe00 commented 6 months ago

WhatsappTestChat.txt

Here's the text I used. I had to manually configure to hformat to get it to work.

For my case, the chat was exported on Android.

lucasrodes commented 6 months ago

Hi @JoshEe00, The automatic header is not working for you because of line 3, where you have the following:

04/03/2024, 22:29 - +1-374-8523 added you to a group in the community: Community 1

This is wrongly interpreted as a user message and breaks the processing. Current workarounds would be to:

  1. Define the format manually:
    df = df_from_whatsapp("WhatsappTestChat.txt", hformat="%d/%m/%y, %H:%M - %name:")
  2. Remove line 3 from the TXT file.

This file might be exported from a community (not a group), showing different exported chat structures.

I will close this issue for now and have written up an issue that follows up on this: https://github.com/lucasrodes/whatstk/issues/147. Once this issue is fixed, the library should work out of the box for you.

Thanks for reporting!