Closed ankitduseja closed 5 years ago
@ankitduseja Try running:
python Chat.py
instead of main.py
@nmoya That one also doesn't seem to work. (Python 2.7.6)
Command with output:
sjors@mothership:~/Downloads/whatsapp-parser-master$ python Chat.py -f pivochat.txt -n root -p WhatsApp
Traceback (most recent call last):
File "Chat.py", line 217, in
@sjorsng I've been willing to review this code entirely but didn't have the time yet. Is the problem still happening or did you manage to solve it? If so, we could open a Pull request and merge it to the repository. What do you think?
@nmoya I did not manage to solve it yet (I'm very new to Python and Github). But sure, I would be glad to help in any way.
@sjorsng I just got the same problem with a chat log of mine. I will work on a fix when I find time.
@sjorsng I just did some commits, could you try running it again with the latest version in master?
@nmoya
sjors@mothership:~/Downloads/whatsapp-parser-master$ python Chat.py -f pivochat.txt -p WhatsApp -n "Sjors Haanen"
Traceback (most recent call last):
File "Chat.py", line 225, in
Same problem. Maybe my WhatsApp logs use another date format which the script can't handle? Example: 12/07/2015, 01:46 - Sjors Haanen: Pyramide
@sjorsng Hey.
First thing, did you pull the latest version before running the code? I replaced Chat.py
to chat.py
yesterday. I also made the root (-n
) optional, and if not provided, there is an interactive input.
Yes, the problem is the format.
So you have two options:
1- Contribute to a open source repository. You need to add support for this new format:
You can do that by modifying the parse
function inside parsers/whatsapp.py
. There is some comments in this function on how to do that. If this is your first contribution, you can fork this repository and the open a Pull Request to merge the latest commit of your fork with mine. (It's very simple).
2- You can modify your chat file from 12/07/2015, 01:46 - Sjors Haanen: Pyramide
to 12/07/2015, 01:46:00: Sjors Haanen: Pyramide
. I used this solution yesterday and it worked. It's ugly, I know, but unfortunately, this new format is not yet supported.
I'm hoping that you chose option 1! I can help you during the process.
hey @nmoya I choose the option 2 to test your program against my recently archived chat. I found the project to be pretty cool and here's how i'm planning to contribute
12/07/2015, 01:46 - Sjors Haanen: Pyramide
Let me know if you're interested!
hey @manu-chroma
yes, definitely! Thanks for the help! We can work together if the second bullet point turns out to be bigger than expected.
thanks for your prompt reply.
I think 2nd point shouldn't be much of work because simple find and replace
to modify the dataset according to the old format was pretty easy (i.e. option 1) and I think this can be easily reflected in the actual code.
A few changes in the following code def parse(self) in whatsapp.py
should do the trick IMHO.
Also, to reduce the complexity of the same, I thought that removing support for all previous formats and only supporting the current format would be great.
Support for the previous format can be made available through previous releases in pip
or different git branch.
some edge cases for the current format:
your code supports formatting of the data as %m/%d/%Y
while in India the dataset had date formatted as %d/%m/%Y
also, since the introduction of encryption on whatsapp there is this line:
04/08/16, 7:01 PM - Messages you send to this chat and calls are now secured with end-to-end encryption. Tap for more info.
I'll reply on this thread if I encounter any other edge cases.
let me know your thoughts on the above ideas!
I don't think we can remove all the other formats. We should expand (probably add a beefy date library), to support all date formats. I just exported a chat from my app and the date is still in %m/%d/%Y
. I'm assuming it changes depending on the phone locale or the country code of the phone number registered in the app. Strangely, I'm not in US so I don't understand why the month comes first in my case, it should be the same as yours.
Instead of having separate git branches, we should detect if its %m/%d/%Y
or %d/%m/%Y
and call the appropriate date parse function.
Yeah, we should remove the auto message of end-to-end encryption. I think we can detect this automated message because the name of the sender will not exist.
totally agree on the using a date library after parsing the message line.
by formats i meant previous whatsapp export formats and not removing support for other date format
older formatting:
09/12/2012 17:03:48: Sender Name: Message
3/24/14, 1:59:59 PM: Sender Name: Message
24/3/14, 13:59:59: Sender Name: Message
newer formatting:
12/07/2015, 01:46 - Sjors Haanen: Pyramide
supporting all exsisting date format is indeed required and should work that way.
I leave the decision of supporting legacy whatsapp export format upto you. removing them would really ease the complexity of the parser I believe. also, let me know if you know of any date library which might be best suited for this task!
Ohh, I see. I agree, let's keep only the newer formatting. Both of us just exported a chat history and the format was the same, not worth keeping this legacy code.
WhatsApp does not officially support app versions prior to voice/video call. This code was written before the voice/video call feature, so I think it is safe to say that these history formats are very unlikely to appear.
Looking at the code again, I think using datetime and passing the date/time with mask is the best we can get. I don't think using a third party date library would improve that piece of code.
https://github.com/nmoya/whatsapp-parser/blob/master/parsers/whatsapp.py#L40
this is the formatting for me:
28/08/16, 11:07 PM - Manvendra: bro
yeah the strptime
in https://github.com/nmoya/whatsapp-parser/blob/master/parsers/whatsapp.py#L40 works well because of fixed formatting of the dataset. I think I can work to extend this code. I'll keep you updated on that.
for the first phase, can i start with adding pip support
and releasing the current version on pip and then working together on the next version based on the changes we discussed?
Yes, sounds like a plan. This is the format to me:
Single line messages:
7/22/16, 14:25 - Nikolas Moya: ?
Multiple line messages:
8/10/16, 13:53 - Nikolas Moya: ???? ??????
??/?? - ??/??: ??????
????????????????????
???????????????????
??????
I guess the PM/AM string still happens sometimes.
yeah, I think it's dependent on the locale/country.
Not working, gives weird output, show full sentences instead of word count.
and then dies with the following output: --SHIFTS evening -> 0 afternoon -> 4 latenight -> 11 morning -> 15
--WEEKDAY Traceback (most recent call last): File "main.py", line 225, in
main()
File "main.py", line 199, in main
output["weekdays"] = c.count_messages_per_weekday()
File "main.py", line 59, in count_messages_per_weekday
weekday = date.date_to_weekday(parsed_date)
NameError: global name 'date' is not defined