joweich / chat-miner

Parsers and visualizations for chats
MIT License
566 stars 57 forks source link

Add support for Instagram Chats #58

Closed KianKhadempour closed 1 year ago

KianKhadempour commented 1 year ago

I am not 100% sure if I covered all the edge cases, but it should work. Also, I couldn't figure out a way to use an f-string for the "{} sent {}'s story." part, so if anyone understands why that wasn't working for me I would appreciate it if you could fix it.

KianKhadempour commented 1 year ago

If you want I can merge the two Instagram Chat commits into one and the two Colab commits into one so that the commit history is less messy.

KianKhadempour commented 1 year ago

Unfortunately I cannot make the colab file fit perfectly because it doesn't work with Jupyter Notebooks.

KianKhadempour commented 1 year ago

Sorry for the delay. I am currently on vacation, but I can fix these issues after.

KianKhadempour commented 1 year ago

I did not close this... Maybe because I deleted all the commits? Shoot. Sorry about that.

KianKhadempour commented 1 year ago

@joweich Would you rather a photo/video return the link or return "photo"/"video"? The reason I ask is because the wordcloud gets diluted with the name of the chat, which happens because the URI is something like:

messages/inbox/[chat name]_[chat id]/photos/[random number]_[random number]_n_[random number].jpg

Switching it to a single word would let you look for "video" or "photo" in the wordcloud to see how many times a video/photo was sent instead of seeing a bunch of numbers and the name of the chat. This also applies to shares. Should I change it to just outputting "share"?

After a bit of testing, I have concluded that not using the URI is the right choice, but putting a warning into the console after each skip happens way too often to be OK, so I am not going to add that.

To add on to this, one of the problems is that special characters (I.e. Ã) appear very often in the wordcloud. I fixed this by adding a min_word_length of 2 in the main.py file that I am using, but I think that this should be the default. I tried hard-coding it but it didn't work, so if someone could look into that it would be great.

KianKhadempour commented 1 year ago

I have fixed the special characters bug in #64, but I still think that the minimum characters should be 2.