mlomb / chat-analytics

Generate interactive, beautiful and insightful chat analysis reports
https://chatanalytics.app
GNU Affero General Public License v3.0
642 stars 48 forks source link

[Suggestion] Instagram chat analysis #114

Open ShortTimeNoSee opened 1 month ago

ShortTimeNoSee commented 1 month ago

Considering you can export your Instagram data in JSON, I think it'd be great if there was a tool out there that could create some visual data of your chat analytics, and this service would be a great home for such a tool.

mlomb commented 1 month ago

Hey there!

create some visual data of your chat analytics

Do you mean exporting the graphs?

hopperelec commented 1 month ago

I think when they said "chat analytics", they were referring to analytics of their chats, not this app

hopperelec commented 1 month ago

For the export instructions, we can reference this. For the structure of the JSON file, I've found a C# project which parses the JSON; we obviously can't use the C# itself but, assuming the format hasn't changed much within the past 5 years, it should give us an idea of what may be contained in the file.

mlomb commented 1 month ago

Sorry, I don't know why I read "Telegram" instead of "Instagram" lol

Yeah

ShortTimeNoSee commented 1 month ago

Hey there!

create some visual data of your chat analytics

Do you mean exporting the graphs?

I mean like what chatanalytics does for other messaging services in general, like Discord.

When you export your data from Instagram you can choose to export your DMs/GCs (messages) in JSON or HTML format. There doesn't exist anything right now that can analyze your chat data the same way your web app can analyze data (and I haven't found anything that analyzes Instagram data exports at all) and put it in a visual format.

The data exports for messages come out as folders for each person, each folder contain message.json file(s). Example:

So I think, given the know-how, it's possible for someone to create a tool that can analyze the JSON exports from message data requests.

While typing this I see hopperelec's addition to the Issue; here's some snippets of the format of the JSON file for messages (collapsed because it's a bunch):

Click me to expand JSON snippets ``` { "participants": [ { "name": "display name" }, { "name": "my display name" } ], "messages": [ { "sender_name": "display name", "timestamp_ms": 1692094898816, "content": "NOOOO", "is_geoblocked_for_viewer": false }, { "sender_name": "display name", "timestamp_ms": 1692094896923, "content": "UR JOKING", "is_geoblocked_for_viewer": false }, { "sender_name": "my display name", "timestamp_ms": 1692094844275, "content": "I used to be a YouTube Shorts guy", "is_geoblocked_for_viewer": false }, { "sender_name": "my display name", "timestamp_ms": 1692094828736, "content": "I upgraded though", "is_geoblocked_for_viewer": false }, ``` Here's an instance of a post being shared: ``` { "sender_name": "my display name", "timestamp_ms": 1692094162049, "content": "You sent an attachment.", "share": { "link": "https://www.instagram.com/reel/CtA55o8gd8v/?id=3116745591766507311_53334871481", "share_text": "The whole meet feeling the rizz \u00f0\u009f\u0098\u0085\n\n\u00f0\u009f\u0093\u00b9 @lucasnvota \n\n#trackandfield #rizz #crosscountry", "original_content_owner": "runnnsphere" }, "is_geoblocked_for_viewer": false }, ``` A photo being sent ``` { "sender_name": "my display name", "timestamp_ms": 1692094109242, "photos": [ { "uri": "your_instagram_activity/messages/inbox/display name_uniqueID/photos/365768647_242613148165388_3593794921027562063_n_242613144832055.jpg", "creation_timestamp": 1692094108 } ], "is_geoblocked_for_viewer": false }, ``` A message that has a reaction: ``` { "sender_name": "my display name", "timestamp_ms": 1692093826812, "content": "talk to him", "reactions": [ { "reaction": "\u00f0\u009f\u0092\u00aa", "actor": "display name" } ], "is_geoblocked_for_viewer": false }, ``` Sent video with reactions: ``` { "sender_name": "my display name", "timestamp_ms": 1692087378497, "videos": [ { "uri": "your_instagram_activity/messages/inbox/display name_uniqueID/videos/366765036_9733517726719643_8623716028334454998_n_800614114872039.mp4", "creation_timestamp": 1692087373 } ], "reactions": [ { "reaction": "\u00e2\u009d\u00a4\u00ef\u00b8\u008f", "actor": "display name" } ], "is_geoblocked_for_viewer": false }, ```
hopperelec commented 1 month ago

Thanks for the samples, those will be very useful!

mlomb commented 1 month ago

This is totally possible, in fact the structure looks very similar to Messenger exports (both from Meta)

Can you try to load them like they were Messenger exports?


Edit: I want to add that the project is designed in a way that adding new platforms is relatively easy, we could support lots of more platforms eventually. The problem is that companies may not be consistent with their export formats :(

ShortTimeNoSee commented 1 month ago

This is totally possible, in fact the structure looks very similar to Messenger exports (both from Meta)

Can you try to load them like they were Messenger exports?

That seems to work pretty well!

image

This part didn't register videos sent or edits (and there isn't anything built in for Reels sent etc.), but those are small aspects. Here's an edited message's JSON though, where I added emojis to the end of the message content:

    {
      "sender_name": "my display name",
      "timestamp_ms": 1720829614035,
      "content": "Last year \u00f0\u009f\u0098\u00ad (edited)",
      "is_geoblocked_for_viewer": false
    },
    {
      "sender_name": "my display name",
      "timestamp_ms": 1720829590324,
      "content": "Last year",
      "is_geoblocked_for_viewer": false
    },

Your web app likely sees this as two messages instead of one message that was edited.

It also doesn't read emojis correctly, only those that don't get turned into Unicode escape sequences or whenever it thinks it sees Discord's emoji format as you can see here:

image

For some reason I was under the impression that there was something that also compared message lengths but I'm not seeing that so I think I'm imagining things (could make an issue for that though, might be a good additional feature).

hopperelec commented 1 month ago

For some reason I was under the impression that there was something that also compared message lengths but I'm not seeing that

Under Language > Language Statistics there is "Average words per message". There is also an issue for this https://github.com/mlomb/chat-analytics/issues/67

ShortTimeNoSee commented 1 month ago

Demonstration of it seeing the edited message as separate messages

image

And I just noticed this as well:

image

For some reason I was under the impression that there was something that also compared message lengths but I'm not seeing that

Under Language > Language Statistics there is "Average words per message". There is also an issue for this #67

Oh I don't even think I've even noticed that. Whatever I was thinking of compared the message lengths between the users.

mlomb commented 1 month ago

The edited messages would be really hard to do since (from the JSON you sent) we have no way to connect both messages (no IDs). Right now each message has a time of when it was edited and the latest content. We either skip all "(edited)" messages causing broken edit stats or create ghost empty messages skewing message counts (maybe we could do some dirty trick to skip but idk) Now that I think of, we need to know the time sent and edited to compute the stats.


The emojis thing you are right, we should handle unicode escaped emojis (and symbols for that matter). Maybe in a new issue?

ShortTimeNoSee commented 1 month ago

we have no way to connect both messages (no IDs)

Yeah that's what I was thinking. Messages can be edited within 15 minutes, but people send multiple messages in that span of time and there's no guaranteeing which message was edited. I think the one thing that can be done relating to this though is excluding messages that end with " (edited)" from any message statistics. Oh got ahead of myself there you already mentioned that

And I just created an issue for the emoji handling 🙏

mlomb commented 1 month ago

Nice! I'll let this open for now. We should make clear that Instagram exports are compatible with Messenger exports. Maybe have two buttons in the UI that use the same parser and have different instructions.

And maybe rename "MessengerParser" to "MetaParser" (?)

hopperelec commented 1 month ago

MetaParser wouldn't really make sense because WhatsApp is also made by Meta but that has its own parser. Also, I would personally read "MetaParser" as meaning a parser for chat-analytics messages, whatever that means lol

mlomb commented 1 month ago

WhatsApp is also made by Meta

jeez these Meta guys, true


I don't use IG/FB, but now that I recall you can message people between both platforms (interoperable). Maybe that's why it is so similar.