Closed ruravi closed 1 year ago
it also doesn't work on Ukrainian date format, e.g.
[05.05.23, 15:45:46] User: text
I used the following input formats:
[05.05.23, 15:48:11] James: Hi here
[11/8/21, 9:41:32 AM] User name: Message 123
1/23/23, 3:19 AM - User 2: Bye!
1/23/23, 3:22_AM - User 1: And let me know if anything changes
New regex that seems to work with all three:
message_line_regex = r"""
\[? # Optional opening square bracket
( # Start of group 1
\d{1,2} # Match 1-2 digits for the day
[\/.] # Match a forward slash or period as the date separator
\d{1,2} # Match 1-2 digits for the month
[\/.] # Match a forward slash or period as the date separator
\d{2,4} # Match 2-4 digits for the year
,\s # Match a comma and a space
\d{1,2} # Match 1-2 digits for the hour
:\d{2} # Match 2 digits for the minutes
(?: # Optional group for seconds
:\d{2} # Match 2 digits for the seconds
)? # Make seconds group optional
(?:[ _](?:AM|PM))? # Optional space or underscore and AM/PM suffix for 12-hour format
) # End of group 1
\]? # Optional closing square bracket
[\s-]* # Match any number of spaces or hyphens
([\w\s]+) # Match and capture one or more word characters or spaces as group 2 (the sender)
[:]+ # Match one or more colons
\s # Match a single space
(.+) # Match and capture one or more of any character as group 3 (the message content)
"""
I can make a PR, but should I test any other formats before?
System Info
langchain 0.0.158 Mac OS M1 Python 3.11
Who can help?
@ey
Information
Related Components
Reproduction
The regular expression used by WhatsAppChatLoader doesn't parse this format successfully
Expected behavior
Parsing fails