mlomb / chat-analytics

Generate interactive, beautiful and insightful chat analysis reports
https://chatanalytics.app
GNU Affero General Public License v3.0
708 stars 51 forks source link

Negative `editedAfter` value causing serialization to break #85

Closed Laiteux closed 1 year ago

Laiteux commented 1 year ago

Exported 500K Telegram messages, Chat Anal worked fine. Tried with 1M from an other group so twice as much and now I'm getting some error:

https://github.com/mlomb/chat-analytics/assets/33204743/9d083506-60eb-4b75-a81d-3f893334060b

Messages -> Sentiment -> Timeline ![msedge_M6bW7i4ipd](https://github.com/mlomb/chat-analytics/assets/33204743/53a590f9-4839-410c-9c46-314938a69322) ![msedge_Nb3GeJH070](https://github.com/mlomb/chat-analytics/assets/33204743/8904016b-9288-4547-b6a2-f787d375e284)
Laiteux commented 1 year ago

Now that I'm thinking about it (not sure if it's related, haven't checked the code, just a random thought), couldn't it be because there was no message sent there since march 2022?

mlomb commented 1 year ago

Now that I'm thinking about it (not sure if it's related, haven't checked the code, just a random thought), couldn't it be because there was no message sent there since march 2022?

Hmm that would be strange, it should not cause any problem

The obfuscator is not ready yet (hopefully I will get to it soon), so I don't know how to go around this one. In the screenshot you posted, if you can place a breakpoint and print the variable e, t or r you will find the Database object, can you share the time part of it?

Laiteux commented 1 year ago

Sorry for the delay. Sure, here you go:

msedge_d4D4mkocy2

mlomb commented 1 year ago

It appears that everything is as it should. I made a PR (#87) to print when indexing dayIndex above 850:

https://github.com/mlomb/chat-analytics/blob/544d6ac8c02da7dbae6ba2b2f6b1df75cd3388ce/pipeline/aggregate/blocks/messages/MessagesPerPeriod.ts#L48-L50

Can you share a screenshot of the console when exporting the chat from the PR? https://pr-87.chat-analytics.pages.dev

Laiteux commented 1 year ago

Sure, thank you

msedge_Qzp5vHvbMu

mlomb commented 1 year ago

Interesting, the values from 851 to 860 are not printed and only one with 1023 is. Since the number of dayBits is 10, 1023 is the maximum, so all bits set.

I managed to track down the issue, it appears that the following indexOf is returning -1: https://github.com/mlomb/chat-analytics/blob/65b524de49422a788aea567219793fb5edf3dfc4/pipeline/process/DatabaseBuilder.ts#L370

Details I added `if (Math.random() < 0.1) msg.dayIndex = -1;` so it drops some messages and the same sections you show crash.

I added some logging for this too in the same PR, can you show me the console but now in the app (where you generate)? https://pr-87.chat-analytics.pages.dev

https://github.com/mlomb/chat-analytics/blob/a0df720971a02be294d5946e0c4450534625c408/pipeline/process/DatabaseBuilder.ts#L373-L382

Laiteux commented 1 year ago

Do I have to wait for it to be updated online?

Laiteux commented 1 year ago

Still doesn't work so I assume not

mlomb commented 1 year ago

Sorry for the delay I had to go!

It updates automatically on each commit, so its live 😄 Did you already tried and no log appeared?

Laiteux commented 1 year ago

Yep. Could you please try yourself? Or it is possible I might be doing this wrong as well.

mlomb commented 1 year ago

I checked the deploy and the log is there 🤔. Maybe dayIndex is never -1.

What do you mean try yourself, is the server public?

I'll keep investigating

Laiteux commented 1 year ago

Well, this is what I have:

sqqNGFvDO0

mlomb commented 1 year ago

Thats the report UI! I mean the log when you generate the report (the app):

Here! (after completion) image

Laiteux commented 1 year ago

Oh sorry, was sure I had checked there.

msedge_F5vSgQ0a2d

mlomb commented 1 year ago

Ok. So I added a check where dayIndex is created:

https://github.com/mlomb/chat-analytics/blob/e4e05cd30fc06db72ace433bc3fee86448417e30/pipeline/process/MessageProcessor.ts#L130-L153

Getting closer... Now the error shuold come up in the generation UI

Edit: wait a few minutes for the deploy

Laiteux commented 1 year ago

I don't see any error on the generation UI

Laiteux commented 1 year ago

msedge_p79QeW8L4X

mlomb commented 1 year ago

Ok so the error does not happens there 🤔

mlomb commented 1 year ago

Just to confirm, this is still happening in v1.1.0? https://chatanalytics.app

Laiteux commented 1 year ago

Hey! It does.

mlomb commented 1 year ago

Aight I think it is better to wait until #82 is done, since I can't fix it in the dark I'll get to it when I have the time :) Thanks for the patience

mlomb commented 1 year ago

Aaaaaaaaaaaaand we are back! #82 is ready The tool to obfuscate reports is ready at https://chatanalytics.app/obfuscator.html Can you obfuscate your report (please audit the code / check the output) and share it? You can upload it here or you can send it by email 😄

Laiteux commented 1 year ago

Hey! Sent via email!

mlomb commented 1 year ago

I managed to replicate the issue! I'll update when I have fixed it!

Thanks!

mlomb commented 1 year ago

I... don't want to scare you... but... it may seem you have a time traveller in your group. If you look at message #637699:

{
  "id": 637699,
  "type": "message",
  "date": "2021-06-24T17:12:33",
  "edited": "2021-06-24T17:12:26",
  "from": "Wilbur Klein",
  "from_id": "7613595408",
  "text": "Terga.",
  "media_type": "animation",
  "mime_type": "video/mp4",
  "file": "(File not included. Change data exporting settings to download.)",
  "thumbnail": "(File not included. Change data exporting settings to download.)"
}

Somehow the edited timestamp is before date, causing editedAfter to be a negative number. editerAfter was being serialized with writeVarInt which does not support negative numbers, thus breaking serialization.

Anyway, the fix was adding the following line: https://github.com/mlomb/chat-analytics/blob/33696f22ec24929008b08d4b285b619b600e54e0/pipeline/serialization/BitStream.ts#L129-L129

Thanks for taking the time to report this and waiting for the obfuscator! 🎊🎉 (and watch out for time distortion around your peers)

Laiteux commented 1 year ago

Amazing! Thank you for your time! Glad we were able to figure this out.

I'll keep an eye on anything strange I see happen in this group. Thanks for the warning.

hopperelec commented 1 year ago

My guess is that someone with a poor connection was sending a message with an attachment and edited it before the attachment had finished sending so the "edit" request had finished before the original message was sent? Still woulda thought the platform would normalize it, though, lol

Laiteux commented 1 year ago

Editing a message while it's sending is not possible on Telegram, afaik

hopperelec commented 1 year ago

Oh. Well better keep the world updated on the actions of this time traveller you have discovered