mowolf / ChatAnalyzer

Java script webapp that analyzes your WhatsApp Chat history locally on your machine.
https://chatanalyzer.moritzwolf.com
Other
208 stars 41 forks source link

Needs to be more flexible, maybe using Regex #3

Closed Thanzex closed 6 years ago

Thanzex commented 6 years ago

The pattern used to match a message is different in many cases, just as pointed in the reddit thread, in my case it's: MM:DD:YY, HH:MM PM/AM - Name: Message

I tried modifying the script to match my specific pattern and found the use of hardcoded numbers for the substrings like:

date[a] = lineArray[j].substring(1,9);
time[a] = lineArray[j].substring(11, 19);
message[a] = lineArray[j].substring(21 + uniqueNames[i].length);

not flexible, indexOf or Regex could be used instead.

In my messages dump i found that many messages can also be multiline like:

{Line} MM:DD:YY, HH:MM PM/AM - Name1: Lorem ipsum dolor sit amet, consectetur adipiscing elit
{Line} MM:DD:YY, HH:MM PM/AM - Name2: sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
{Line}
{Line} Ut enim ad minim veniam,
{Line} MM:DD:YY, HH:MM PM/AM - Name:1  quis nostrud exercitation

As far as i can understand the script treats single lines as messages which is not the case in here.

Utimately i think using regex would be far better to get all the elements, this expression captures every message multiline or not and divides the result in Date, Time, Name and Message: (\d\d\/\d\d\/\d\d)(?:\, )(\d+\:\d+ (?:AM|PM))(?: - )(.*)(?::)((?:.*)(?:[\r\n]*)(?:.*))

regexp

This way many functions can be greatly simplified and be more reliable and changing the regex pattern to accomodate different formats is trivial. If i have some time i'll try to implement it!

mowolf commented 6 years ago

As far as i can understand the script treats single lines as messages which is not the case in here.

No it does not. It filters for the start of a line.

Utimately i think using regex would be far better to get all the elements, this expression captures every message multiline or not and divides the result in Date, Time, Name and Message:

Yeah, that looks very elegant! Thanks for posting that need to get to understand regex... (I really underestimated it) Will try to implement that!

mowolf commented 6 years ago

I tried it. Can anyone help me with making the regex now when a new message starts...

Debuggex Demo

Here are all the formats that I want to add: https://docs.google.com/spreadsheets/d/1mZCE_tFelvqmLh0vIt7vMjU1OYB0etuhwXRl3Fzv6k8/edit?usp=sharing

mtuit commented 6 years ago

@mowolf I think this covers everything you want to support:

https://www.debuggex.com/r/4A_OgK9IYoAqQgVX

At least this is tested on all the matches you provided in your Debuggex Demo. By no means does it deserve a beauty price, but it works.

mowolf commented 6 years ago

@mtuit wow! Thanks a lot.

mowolf commented 6 years ago

@mtuit I think I got sth that works even for exports that have no \n. Do you have any idea how to capture the whole message as well? Probably thill will be all what I need. Thanks again.

https://www.debuggex.com/r/l7dr5nw82useuW3B

Okay now I do have troubles integrating it in the code. var regex = \(\[?)((\d{1,4}(\-|\/|\.){1}){2}\d{2,4})((\sum\s|\s)|\,\s|\.\s){1}((\d{1,2}\:)\d{2}(:\d{2})?)(\s(A|P)?M|\s(a|p)?\.\s\m\.)?(\]\s|\s\-\s|\:)(.)([^:]*)(: )\;

throws an syntax error. Do you know hot to fix that? Forgot to escape the backslashes

mtuit commented 6 years ago

@mowolf This is something that captures the text as well:

https://www.debuggex.com/r/jCSsefyH62aox2qj

I'm not really sure how to check if exports don't have \n, however I think every export does have that anyway, so might not be relevant. However if you wanted to add it nevertheless it could perhaps be done with a negative lookahead (which I don't have a lot of experience with so I couldn't get it to work).

About your syntax error, I think this is due to the fact that you are using '\' instead of '/' in your regex string. Try if this works: var regex = /(\[?)((\d{1,4}(\-|\/|\.){1}){2}\d{2,4})((\sum\s|\s)|\,\s|\.\s){1}((\d{1,2}\:)\d{2}(:\d{2})?)(\s(A|P)?M|\s(a|p)?\.\s\m\.)?(\]\s|\s\-\s|\:)(.)([^:]*)(: )/;

mowolf commented 6 years ago

Thanks @mtuit! Got it working.