knadh / tg-archive

A tool for exporting Telegram group chats into static websites like mailing list archives.
MIT License
903 stars 139 forks source link

Can't build site #99

Open KingOfGerrit opened 1 year ago

KingOfGerrit commented 1 year ago

Have issue when run tg-archive --build Error:

2023-03-16 23:58:20,716: building site
Traceback (most recent call last):
  File "/home/telegram/.local/bin/tg-archive", line 8, in <module>
    sys.exit(main())
  File "/home/telegram/.local/lib/python3.10/site-packages/tgarchive/__init__.py", line 161, in main
    b.build()
  File "/home/telegram/.local/lib/python3.10/site-packages/tgarchive/build.py", line 87, in build
    self._render_page(messages, month, dayline,
  File "/home/telegram/.local/lib/python3.10/site-packages/tgarchive/build.py", line 116, in _render_page
    html = self.template.render(config=self.config,
  File "/home/telegram/.local/lib/python3.10/site-packages/jinja2/environment.py", line 1301, in render
    self.environment.handle_exception()
  File "/home/telegram/.local/lib/python3.10/site-packages/jinja2/environment.py", line 936, in handle_exception
    raise rewrite_traceback_stack(source=source)
  File "<template>", line 90, in top-level template code
  File "/home/telegram/.local/lib/python3.10/site-packages/jinja2/environment.py", line 485, in getattr
    return getattr(obj, attribute)
jinja2.exceptions.UndefinedError: 'collections.OrderedDict object' has no attribute '2021-01-15'

Config:

api_id: ""
api_hash: ""

group: "test"

download_media: True
download_avatars: True
avatar_size: [64, 64] # Width, Height.
media_dir: "media"

media_mime_types: []

use_takeout: False

proxy:
  enable: False
  protocol: "socks5"
  addr: "127.0.0.1"
  port: 1080

fetch_batch_size: 2000

fetch_wait: 15

fetch_limit: 0

publish_dir: "site"
static_dir: "static"
per_page: 500
show_day_index: True

telegram_url: "https://t.me/{id}"

show_sender_fullname: False

timezone: "US/Eastern"

publish_rss_feed: False
rss_feed_entries: 100

site_url: "https://test.com"
site_name: "@{group} - Telegram group archive"
site_description: "Public archive of Telegram messages."
meta_description: "@{group} {date} - Telegram message archive."
page_title: "Page {page} - {date} @{group} Telegram message archive."
w316525096 commented 1 year ago

Have you solved it yet? I have the same problem

w316525096 commented 1 year ago

Have you solved it yet? I have the same problem

# Timezone to apply on timestamps when building the site. If no value
# is specified, all timestamps are in UTC:
# Eg: US/Eastern  Asia/Kolkata
timezone: ""

Leave time zone blank

scarlion1 commented 1 year ago

With timezone blank, the site builds, but timestamps are all in UTC time.  Is it an issue with newer python?  I see OP using 3.10 and I have 3.11.2, but tg-archive only tested with 3.8.6...

dangnhdev commented 1 year ago

With timezone blank, the site builds, but timestamps are all in UTC time. Is it an issue with newer python? I see OP using 3.10 and I have 3.11.2, but tg-archive only tested with 3.8.6...

I'm using python 3.8.6 (Windows 10) and got this issue too

tg-archive --build
2023-10-05 07:02:59,918: building site
Traceback (most recent call last):
  File "C:\Python38\Scripts\tg-archive-script.py", line 33, in <module>
    sys.exit(load_entry_point('tg-archive==1.1.3', 'console_scripts', 'tg-archive')())
  File "c:\python38\lib\site-packages\tgarchive\__init__.py", line 161, in main
    b.build()
  File "c:\python38\lib\site-packages\tgarchive\build.py", line 87, in build
    self._render_page(messages, month, dayline,
  File "c:\python38\lib\site-packages\tgarchive\build.py", line 116, in _render_page
    html = self.template.render(config=self.config,
  File "c:\python38\lib\site-packages\jinja2\environment.py", line 1301, in render
    self.environment.handle_exception()
  File "c:\python38\lib\site-packages\jinja2\environment.py", line 936, in handle_exception
    raise rewrite_traceback_stack(source=source)
  File "<template>", line 90, in top-level template code
  File "c:\python38\lib\site-packages\jinja2\environment.py", line 485, in getattr
    return getattr(obj, attribute)
jinja2.exceptions.UndefinedError: 'collections.OrderedDict object' has no attribute '2020-03-08'
scarlion1 commented 1 year ago

The problem seems to be the date stored in the database is in UTC, but when tg-archive pulls it out (such as in File "\<template>", line 90), it is corrected by the timezone setting, creating a discrepancy that causes the crash.  For me, it appears to happen in the beginning of the database when the day changes from 1/31 to 2/1.  I think tg-archive pulls a message from early 2/1 which turns to 1/31 when timezone is applied and freaks it out.  Maybe if you are lucky and all your messages at the beginning of all your months are more than timezone hours from UTC midnight, then you might get a successful build?

By changing line 90 to get rid of the counting, <span class="title">{{ day }} <span class="count">(X messages)</span></span> the --build command completes (using --symlink here too), but there are still problems:

  1. Site last message is the last message in October from the database
  2. 2023-10.html actually starts with messages from the end of Sep within timezone offset hours of midnight Oct 1, corresponding to messages in the database that begin on/after Oct 1 midnight UTC.  All months begin like this as long as you have messages within the time frame of your timezone offset.
  3. The title for all media has already been saved in the database with a UTC date + time.  For example, a photo sent at HH:MM local time has already been saved to the database with a title of photo_YYYY-MM-DD_(HH+tz offset)-mm-ss.jpg.  If media is sent around the end of the month or year then MM-DD or YYYY-MM-DD could also be different from the local time stamp.

Maybe by adding timezone offset hours to the day variable or elsewhere in the template.html could fix all of this?  I tried for too long and couldn't figure out, apparently there is a datetime.timedelta(hours=H) that could be used but could not figure out how, kept getting different errors such as jinja2.exceptions.UndefinedError: 'datetime.datetime object' has no attribute 'timedelta' or jinja2.exceptions.UndefinedError: 'datetime' is undefined even though I put import datetime all over the place and m.date is already a datetime object. 🤬

scarlion1 commented 11 months ago

I found another workaround using JavaScript.  This will adjust the time/date stamps of the site to the timezone corresponding to the viewer's locale setting.  It has some of the same issues from my previous findings, but not as bad.  E.g. the messages at the beginning/ending of each day/month remain under their original date, but all messages from the database up to the present are shown now, at least.

Starting on line 2 of main.js, I inserted this code (had some fun with the box drawing chars with my comments, sorry in advance!):

JavaScript Time/Date Stamp Adjustment ``` /*╾┈ get & save the tz offset (in minutes) according to the current locale settings ┈╼*/ const tzmins = (new Date()).getTimezoneOffset(); /*╾┈ Iterate over every "date" class element ┈╼*/ document.querySelectorAll(".date").forEach( function (i) { /* Date's are stored in milliseconds since the epoch, so: ╿ 1. grab the inner text from the selected element ┊ 2. parse it into a Date object ┊ 3. convert to number by multiplying by 1? (copied from Internet) ┊ 4. convert your timezone offset to milliseconds ┊ 5. add that to the grabbed milliseconds ┊ 6. convert total to new Date object ┊ 7. save the new date for use in next line ╰┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈╼*/ let d = new Date((Date.parse(i.innerText) * 1) + (-tzmins*60*1000)); /* 1. Format the new date similar to the old, according to browser setting ╿ "navigator.language" either returns 2-char language (en, fr, ...) ┊ or 5-char locale identifier (en-US, fr-FR, ...) ┊ Both appear to format the date sufficiently ┊ old: ,
<3-Letter month> ┊ new: <"short"-style time: HH:MM AM/PM or 24-hour HH:MM>, ┊ <"medium"-style date: 1- or 2-digit day, 3-letter month, ┊ 4-digit year, arranged according to navigator.language> ┊ 2. reset the inner text of the selected element ╰┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈╼*/ i.innerText = d.toLocaleTimeString(navigator.language, { timeStyle: "short", }) + ", " + d.toLocaleDateString(navigator.language, { dateStyle: "medium", }); } ); ```
zv09 commented 3 months ago

is it issue is still not solved anyway? try to fix some code with get_timeline and cannot find working solution... how to fix this just using utc shift in config.yaml? without any timezone descriptor...

knadh commented 3 months ago

Looks like this hasn't been resolved yet. If somebody can debug this and send a PR, will be happy to merge.

saurabhrane1199 commented 3 months ago

Can someone provide some replication for this issue? I am not able to replicate this issue, I think for some specific dates it is acting weird. A dump of sqllite db with anonymised values should help.