Closed hopperelec closed 1 year ago
👏
Is date_unixtime
always available? Current versions of Telegram Desktop (at least on Windows) contain date_unixtime
. But older exports do not (as you can see in the test that fails)
We can use date_unixtime
by default but we have to support older exports too
Ah, that would explain why it's not listed in the comment. Though, strangely, the test that fails seems to be for "telegram/DM_2A_7M.json", which does have date_unixtime
, although "BIG_20A_5475M.json" doesn't. I'm curious why we have to support older exports for Typescript? I seem to remember mentioning that I had some Discord exports from channels I no longer have access to and you said we wouldn't support old exports because it's a niche issue
I don't remember the Discord thing, in this case ALL exports before date_unixtime
was added will be broken if we switch to it.
The core problem lies here:
We call markEOF
per file, so if a file goes back in time, it breaks:
We could add a "markEOF"
event to the Parser
and call it manually (in the Telegram parser) if we detect DST:
There is another problem with this, I'm assuming that if we process messages from A to B, then we can't add more messages to that range, it is assumed to be duplicated, so we'll lose 1 hour of messages for DST. I think we can just live with this.
why we have to support older exports for Typescript
I don't understand what you mean, In this case we add date_unixtime?: string
so it can be undefined.
I want to thank you once again for taking the time to comment and contribute to chat-analytics :)
I don't remember the Discord thing
Sorry, I think I've found what I was remembering and it was when I was asking about TXT and HTML export support, which I suppose is much more of a niche than just supporting older versions haha
We could add a "markEOF" event to the Parser and call it manually (in the Telegram parser) if we detect DST:
Would we just be assuming that any messages ordered incorrectly are due to DST? Because if we were actually detecting it, would it not just be better to modify the timestamp accordingly? In fact, Date.parse
has a warning about it not being extensive and it recommends using other libraries to parse date strings; I'd imagine other libraries would take into account DST for us.
I don't understand what you mean, In this case we add date_unixtime?: string so it can be undefined.
I wasn't asking about how we would do it, just why and only because I didn't think the same logic was being applied to other platforms. However, I now realise I was misremembering that, and I am fully in support of supporting slightly outdated exports (of course, to an extent!)
Hey! What exactly is needed here? If you don't have the time, maybe I could work on it
The fix needs to support exports which only have date and not date_unixtime, which means using a different method to convert the date string to a Unix timestamp (one which takes into account DST, likely a library). You could also just skip messages sent while DST catches up, but if only consider this a temporary fix. I'm currently working on #81 so I'm happy for you to work on this one!
Sorry for not responding earlier. I can not work on that.
⚡ Preview for this PR: https://pr-80.chat-analytics.pages.dev 📊 Demo
Patch coverage: 71.42
% and project coverage change: +0.36
:tada:
Comparison is base (
452037f
) 74.08% compared to head (0037e4c
) 74.44%.:exclamation: Current head 0037e4c differs from pull request most recent head 5bbcaf5. Consider uploading reports for the commit 5bbcaf5 to get more accurate results
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
Thank you!
Updated TelegramParser to use the unixtime provided by the export rather than parsing the date string
Fixes https://github.com/mlomb/chat-analytics/issues/79