Closed Martin-Furter closed 2 years ago
2022-02-24 11:13:24,030: error downloading media: #2690: The file reference has expired and is no longer valid or it belongs to self-destructing media and cannot be resent (caused by GetFileRequest)
Ah, it looks like this has something to do with the Telegram API itself or the Telethon API client. I am afraid I wouldn't be able to debug this.
Hello there,
same issue here with only some groups. However there is a workaround. If the message appears you can download the media by executing a sync for the specific ID.
e.g.: 2022-05-19 12:10:26,307: error downloading media: #117: The file reference has expired and is no longer valid or it belongs to self-destructing media and cannot be resent (caused by GetFileRequest)
$ tg-archive --sync -id 117 2022-05-20 09:16:24,767: downloading media #117 2022-05-20 09:16:24,789: Starting direct file download in chunks of 131072 at 0, stride 131072 2022-05-20 09:16:30,680: finished. fetched 1 messages. last message = 2022-05-15 07:32:18+00:00
So I'll search for failed media IDs after sync and execute the specific sync for each error afterwards. Maybe @knadh can add this logic to the repo?
Maybe @knadh can add this logic to the repo?
Will think about this.
A hacky way this can be achieved would be to do something like tg-archive --sync > sync.log
and use shell scripting to grep
the IDs of failed media and run sync on them again.
Following works for me. Retrying on media download, as far as is can see all previous failed media downloads worked now.
Better way would probably be to except the error in the python code and react on it.
#!/bin/bash
# Setting up absolute path as cron can't find program ...
tg_archiver_bin='/usr/local/bin/tg-archive'
# Declaring empty array and adding values one by one for better readability.
tg_archive_paths=()
# Add paths one by one. Don't use ~ as this doesn't work in script!
tg_archive_paths+=("/path/to/tg-archive")
# Iterate over each directory, poll new messages and build new html pages
for tg_archive_path in ${tg_archive_paths[@]}
do
echo "Updating directory '$tg_archive_path' ..."
# Switching to directory as tg-archive doesn't work well with parameterized call ...
cd "$tg_archive_path"
if [[ $? -ne 0 ]]
then
echo -e "ERROR: Directory does not exist, skipping this one!\n"
continue
fi
# Getting new messages and saving output to file
$tg_archiver_bin --sync 2>&1 | tee output.txt
if [[ $? -ne 0 ]]
then
rm output.txt
echo -e "ERROR: Telegram sync on '$tg_archive_path' failed!\n"
continue
fi
# Iterating over media download errors and trying to get them by single download
error_media_numbers=`cat output.txt | grep "error downloading media" | awk '{ print $6 }' | sed 's/#//g' | sed 's/://g'`
for error_media_number in $error_media_numbers
do
echo "Retrying failed media file #$error_media_number ..."
$tg_archiver_bin --sync -id $error_media_number
if [[ $? -ne 0 ]]
then
echo -e "ERROR: Telegram sync on '$tg_archive_path' failed!\n"
fi
done
rm output.txt
# Building HTML content
$tg_archiver_bin --build
if [[ $? -ne 0 ]]
then
echo -e "ERROR: Page build on '$tg_archive_path' failed!\n"
continue
fi
echo -e "Updated directory '$tg_archive_path'.\n"
done
I have setup tg-archive to download the channel MARKmobil just to have a backup of all his work.
The website can be found here: https://markmobil.borg.ch/telegram/2020-10.html#2020-10-06
The web pages generated from the downloaded data look fine until September 2020, the last picture i can see is from October 2020, and after that all movies and pictures are missing.
I can see many of the following error messages with differing media number: 2022-02-24 11:13:23,855: downloading media #2690 2022-02-24 11:13:23,856: Starting direct file download in chunks of 131072 at 0, stride 131072 2022-02-24 11:13:24,030: error downloading media: #2690: The file reference has expired and is no longer valid or it belongs to self-destructing media and cannot be resent (caused by GetFileRequest)
If I look into the channel using telegram-desktop I can still see the latest pictures.