nmoya / whatsapp-parser

Parser to the What's App log file.
MIT License
47 stars 23 forks source link

Weird output #4

Closed ankitduseja closed 5 years ago

ankitduseja commented 9 years ago

Not working, gives weird output, show full sentences instead of word count.

and then dies with the following output: --SHIFTS evening -> 0 afternoon -> 4 latenight -> 11 morning -> 15

--WEEKDAY Traceback (most recent call last): File "main.py", line 225, in main() File "main.py", line 199, in main output["weekdays"] = c.count_messages_per_weekday() File "main.py", line 59, in count_messages_per_weekday weekday = date.date_to_weekday(parsed_date) NameError: global name 'date' is not defined

nmoya commented 9 years ago

@ankitduseja Try running: python Chat.py instead of main.py

shaanen commented 8 years ago

@nmoya That one also doesn't seem to work. (Python 2.7.6)

Command with output: sjors@mothership:~/Downloads/whatsapp-parser-master$ python Chat.py -f pivochat.txt -n root -p WhatsApp Traceback (most recent call last): File "Chat.py", line 217, in c.all_features() File "Chat.py", line 102, in all_features self.features.compute_response_time_and_burst(self.messages, self.root, self.senders, initiation_thrs, burst_thrs, response_thrs) File "/home/sjors/Downloads/whatsapp-parser-master/ChatFeatures.py", line 27, in compute_response_time_and_burst t0 = list_of_messages[0].datetime_obj IndexError: list index out of range

nmoya commented 8 years ago

@sjorsng I've been willing to review this code entirely but didn't have the time yet. Is the problem still happening or did you manage to solve it? If so, we could open a Pull request and merge it to the repository. What do you think?

shaanen commented 8 years ago

@nmoya I did not manage to solve it yet (I'm very new to Python and Github). But sure, I would be glad to help in any way.

nmoya commented 8 years ago

@sjorsng I just got the same problem with a chat log of mine. I will work on a fix when I find time.

nmoya commented 8 years ago

@sjorsng I just did some commits, could you try running it again with the latest version in master?

shaanen commented 8 years ago

@nmoya sjors@mothership:~/Downloads/whatsapp-parser-master$ python Chat.py -f pivochat.txt -p WhatsApp -n "Sjors Haanen" Traceback (most recent call last): File "Chat.py", line 225, in c.all_features() File "Chat.py", line 105, in all_features self.features.compute_response_time_and_burst(self.messages, self.root, self.senders, initiation_thrs, burst_thrs, response_thrs) File "/home/sjors/Downloads/whatsapp-parser-master/ChatFeatures.py", line 27, in compute_response_time_and_burst t0 = list_of_messages[0].datetime_obj IndexError: list index out of range

Same problem. Maybe my WhatsApp logs use another date format which the script can't handle? Example: 12/07/2015, 01:46 - Sjors Haanen: Pyramide

nmoya commented 8 years ago

@sjorsng Hey.

First thing, did you pull the latest version before running the code? I replaced Chat.py to chat.py yesterday. I also made the root (-n) optional, and if not provided, there is an interactive input.

Yes, the problem is the format. So you have two options: 1- Contribute to a open source repository. You need to add support for this new format: You can do that by modifying the parse function inside parsers/whatsapp.py. There is some comments in this function on how to do that. If this is your first contribution, you can fork this repository and the open a Pull Request to merge the latest commit of your fork with mine. (It's very simple). 2- You can modify your chat file from 12/07/2015, 01:46 - Sjors Haanen: Pyramide to 12/07/2015, 01:46:00: Sjors Haanen: Pyramide. I used this solution yesterday and it worked. It's ugly, I know, but unfortunately, this new format is not yet supported.

I'm hoping that you chose option 1! I can help you during the process.

manu-chroma commented 7 years ago

hey @nmoya I choose the option 2 to test your program against my recently archived chat. I found the project to be pretty cool and here's how i'm planning to contribute

Let me know if you're interested!

nmoya commented 7 years ago

hey @manu-chroma

yes, definitely! Thanks for the help! We can work together if the second bullet point turns out to be bigger than expected.

manu-chroma commented 7 years ago

thanks for your prompt reply.

I think 2nd point shouldn't be much of work because simple find and replace to modify the dataset according to the old format was pretty easy (i.e. option 1) and I think this can be easily reflected in the actual code.

A few changes in the following code def parse(self) in whatsapp.py should do the trick IMHO.

Also, to reduce the complexity of the same, I thought that removing support for all previous formats and only supporting the current format would be great.

Support for the previous format can be made available through previous releases in pip or different git branch.

some edge cases for the current format:

I'll reply on this thread if I encounter any other edge cases.

let me know your thoughts on the above ideas!

nmoya commented 7 years ago

I don't think we can remove all the other formats. We should expand (probably add a beefy date library), to support all date formats. I just exported a chat from my app and the date is still in %m/%d/%Y. I'm assuming it changes depending on the phone locale or the country code of the phone number registered in the app. Strangely, I'm not in US so I don't understand why the month comes first in my case, it should be the same as yours.

Instead of having separate git branches, we should detect if its %m/%d/%Y or %d/%m/%Y and call the appropriate date parse function.

Yeah, we should remove the auto message of end-to-end encryption. I think we can detect this automated message because the name of the sender will not exist.

manu-chroma commented 7 years ago

totally agree on the using a date library after parsing the message line.

by formats i meant previous whatsapp export formats and not removing support for other date format

older formatting:

09/12/2012 17:03:48: Sender Name: Message
3/24/14, 1:59:59 PM: Sender Name: Message
24/3/14, 13:59:59: Sender Name: Message

newer formatting: 
12/07/2015, 01:46 - Sjors Haanen: Pyramide

supporting all exsisting date format is indeed required and should work that way.

I leave the decision of supporting legacy whatsapp export format upto you. removing them would really ease the complexity of the parser I believe. also, let me know if you know of any date library which might be best suited for this task!

nmoya commented 7 years ago

Ohh, I see. I agree, let's keep only the newer formatting. Both of us just exported a chat history and the format was the same, not worth keeping this legacy code.

WhatsApp does not officially support app versions prior to voice/video call. This code was written before the voice/video call feature, so I think it is safe to say that these history formats are very unlikely to appear.

nmoya commented 7 years ago

Looking at the code again, I think using datetime and passing the date/time with mask is the best we can get. I don't think using a third party date library would improve that piece of code.

https://github.com/nmoya/whatsapp-parser/blob/master/parsers/whatsapp.py#L40

manu-chroma commented 7 years ago

this is the formatting for me:
28/08/16, 11:07 PM - Manvendra: bro

yeah the strptime in https://github.com/nmoya/whatsapp-parser/blob/master/parsers/whatsapp.py#L40 works well because of fixed formatting of the dataset. I think I can work to extend this code. I'll keep you updated on that.

for the first phase, can i start with adding pip support and releasing the current version on pip and then working together on the next version based on the changes we discussed?

nmoya commented 7 years ago

Yes, sounds like a plan. This is the format to me:

Single line messages: 7/22/16, 14:25 - Nikolas Moya: ?

Multiple line messages:

8/10/16, 13:53 - Nikolas Moya: ???? ??????
??/?? - ??/??: ??????

????????????????????
???????????????????

??????

I guess the PM/AM string still happens sometimes.

manu-chroma commented 7 years ago

yeah, I think it's dependent on the locale/country.