Closed Thanzex closed 6 years ago
As far as i can understand the script treats single lines as messages which is not the case in here.
No it does not. It filters for the start of a line.
Utimately i think using regex would be far better to get all the elements, this expression captures every message multiline or not and divides the result in Date, Time, Name and Message:
Yeah, that looks very elegant! Thanks for posting that need to get to understand regex... (I really underestimated it) Will try to implement that!
I tried it. Can anyone help me with making the regex now when a new message starts...
Here are all the formats that I want to add: https://docs.google.com/spreadsheets/d/1mZCE_tFelvqmLh0vIt7vMjU1OYB0etuhwXRl3Fzv6k8/edit?usp=sharing
@mowolf I think this covers everything you want to support:
https://www.debuggex.com/r/4A_OgK9IYoAqQgVX
At least this is tested on all the matches you provided in your Debuggex Demo. By no means does it deserve a beauty price, but it works.
@mtuit wow! Thanks a lot.
@mtuit I think I got sth that works even for exports that have no \n. Do you have any idea how to capture the whole message as well? Probably thill will be all what I need. Thanks again.
https://www.debuggex.com/r/l7dr5nw82useuW3B
Okay now I do have troubles integrating it in the code.
var regex = \(\[?)((\d{1,4}(\-|\/|\.){1}){2}\d{2,4})((\sum\s|\s)|\,\s|\.\s){1}((\d{1,2}\:)\d{2}(:\d{2})?)(\s(A|P)?M|\s(a|p)?\.\s\m\.)?(\]\s|\s\-\s|\:)(.)([^:]*)(: )\;
throws an syntax error. Do you know hot to fix that?
Forgot to escape the backslashes
@mowolf This is something that captures the text as well:
https://www.debuggex.com/r/jCSsefyH62aox2qj
I'm not really sure how to check if exports don't have \n, however I think every export does have that anyway, so might not be relevant. However if you wanted to add it nevertheless it could perhaps be done with a negative lookahead (which I don't have a lot of experience with so I couldn't get it to work).
About your syntax error, I think this is due to the fact that you are using '\' instead of '/' in your regex string. Try if this works:
var regex = /(\[?)((\d{1,4}(\-|\/|\.){1}){2}\d{2,4})((\sum\s|\s)|\,\s|\.\s){1}((\d{1,2}\:)\d{2}(:\d{2})?)(\s(A|P)?M|\s(a|p)?\.\s\m\.)?(\]\s|\s\-\s|\:)(.)([^:]*)(: )/;
Thanks @mtuit! Got it working.
The pattern used to match a message is different in many cases, just as pointed in the reddit thread, in my case it's:
MM:DD:YY, HH:MM PM/AM - Name: Message
I tried modifying the script to match my specific pattern and found the use of hardcoded numbers for the substrings like:
not flexible,
indexOf
or Regex could be used instead.In my messages dump i found that many messages can also be multiline like:
As far as i can understand the script treats single lines as messages which is not the case in here.
Utimately i think using regex would be far better to get all the elements, this expression captures every message multiline or not and divides the result in Date, Time, Name and Message:
(\d\d\/\d\d\/\d\d)(?:\, )(\d+\:\d+ (?:AM|PM))(?: - )(.*)(?::)((?:.*)(?:[\r\n]*)(?:.*))
This way many functions can be greatly simplified and be more reliable and changing the regex pattern to accomodate different formats is trivial. If i have some time i'll try to implement it!