mlomb / chat-analytics

Generate interactive, beautiful and insightful chat analysis reports
https://chatanalytics.app
GNU Affero General Public License v3.0
711 stars 51 forks source link

Support for DST and 'date' & 'unixtime' in Telegram exports #80

Closed hopperelec closed 1 year ago

hopperelec commented 1 year ago

Updated TelegramParser to use the unixtime provided by the export rather than parsing the date string

Fixes https://github.com/mlomb/chat-analytics/issues/79

Laiteux commented 1 year ago

👏

mlomb commented 1 year ago

Is date_unixtime always available? Current versions of Telegram Desktop (at least on Windows) contain date_unixtime. But older exports do not (as you can see in the test that fails) We can use date_unixtime by default but we have to support older exports too

hopperelec commented 1 year ago

Ah, that would explain why it's not listed in the comment. Though, strangely, the test that fails seems to be for "telegram/DM_2A_7M.json", which does have date_unixtime, although "BIG_20A_5475M.json" doesn't. I'm curious why we have to support older exports for Typescript? I seem to remember mentioning that I had some Discord exports from channels I no longer have access to and you said we wouldn't support old exports because it's a niche issue

mlomb commented 1 year ago

I don't remember the Discord thing, in this case ALL exports before date_unixtime was added will be broken if we switch to it.

The core problem lies here:

https://github.com/mlomb/chat-analytics/blob/452037f2ada3cff2fc656dc75a2938d4e3324094/pipeline/process/ChannelMessages.ts#L24-L27

We call markEOF per file, so if a file goes back in time, it breaks:

https://github.com/mlomb/chat-analytics/blob/452037f2ada3cff2fc656dc75a2938d4e3324094/pipeline/process/DatabaseBuilder.ts#L102

We could add a "markEOF" event to the Parser and call it manually (in the Telegram parser) if we detect DST:

https://github.com/mlomb/chat-analytics/blob/452037f2ada3cff2fc656dc75a2938d4e3324094/pipeline/parse/Parser.ts#L16-L19


There is another problem with this, I'm assuming that if we process messages from A to B, then we can't add more messages to that range, it is assumed to be duplicated, so we'll lose 1 hour of messages for DST. I think we can just live with this.


why we have to support older exports for Typescript

I don't understand what you mean, In this case we add date_unixtime?: string so it can be undefined.


I want to thank you once again for taking the time to comment and contribute to chat-analytics :)

hopperelec commented 1 year ago

I don't remember the Discord thing

Sorry, I think I've found what I was remembering and it was when I was asking about TXT and HTML export support, which I suppose is much more of a niche than just supporting older versions haha

We could add a "markEOF" event to the Parser and call it manually (in the Telegram parser) if we detect DST:

Would we just be assuming that any messages ordered incorrectly are due to DST? Because if we were actually detecting it, would it not just be better to modify the timestamp accordingly? In fact, Date.parse has a warning about it not being extensive and it recommends using other libraries to parse date strings; I'd imagine other libraries would take into account DST for us.

I don't understand what you mean, In this case we add date_unixtime?: string so it can be undefined.

I wasn't asking about how we would do it, just why and only because I didn't think the same logic was being applied to other platforms. However, I now realise I was misremembering that, and I am fully in support of supporting slightly outdated exports (of course, to an extent!)

Laiteux commented 1 year ago

Hey! What exactly is needed here? If you don't have the time, maybe I could work on it

hopperelec commented 1 year ago

The fix needs to support exports which only have date and not date_unixtime, which means using a different method to convert the date string to a Unix timestamp (one which takes into account DST, likely a library). You could also just skip messages sent while DST catches up, but if only consider this a temporary fix. I'm currently working on #81 so I'm happy for you to work on this one!

Laiteux commented 1 year ago

Sorry for not responding earlier. I can not work on that.

github-actions[bot] commented 1 year ago

⚡ Preview for this PR: https://pr-80.chat-analytics.pages.dev 📊 Demo

codecov[bot] commented 1 year ago

Codecov Report

Patch coverage: 71.42% and project coverage change: +0.36 :tada:

Comparison is base (452037f) 74.08% compared to head (0037e4c) 74.44%.

:exclamation: Current head 0037e4c differs from pull request most recent head 5bbcaf5. Consider uploading reports for the commit 5bbcaf5 to get more accurate results

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #80 +/- ## ========================================== + Coverage 74.08% 74.44% +0.36% ========================================== Files 55 56 +1 Lines 2257 2297 +40 Branches 497 484 -13 ========================================== + Hits 1672 1710 +38 - Misses 537 538 +1 - Partials 48 49 +1 ``` | [Impacted Files](https://app.codecov.io/gh/mlomb/chat-analytics/pull/80?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Mart%C3%ADn+Lombardo) | Coverage Δ | | |---|---|---| | [pipeline/parse/parsers/TelegramParser.ts](https://app.codecov.io/gh/mlomb/chat-analytics/pull/80?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Mart%C3%ADn+Lombardo#diff-cGlwZWxpbmUvcGFyc2UvcGFyc2Vycy9UZWxlZ3JhbVBhcnNlci50cw==) | `83.33% <71.42%> (-2.16%)` | :arrow_down: | ... and [5 files with indirect coverage changes](https://app.codecov.io/gh/mlomb/chat-analytics/pull/80/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Mart%C3%ADn+Lombardo)

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

Laiteux commented 1 year ago

Thank you!