iyear / tdl

📥 A Telegram toolkit written in Golang
https://docs.iyear.me/tdl
GNU Affero General Public License v3.0
4.45k stars 441 forks source link

--skip-same and -i options take too long, can we have skip-same-name? #446

Open mautematico opened 10 months ago

mautematico commented 10 months ago

Hello, there! And thanks for this awesome tool!

I've found --skip-same to be, IMO, time consuming.

Let's say there is a CHAT with a, growing, large file list.

export media list:

tdl chat export -c CHAT

and download everything:

time tdl dl --continue --desc --skip-same -f tdl-export.json -i mp3
real    4m2.342s
user    0m37.032s
sys 0m32.502s

Then, without changes made to tdl-export.json, re-run last command:

time tdl dl --continue --desc --skip-same -f tdl-export.json -i mp3
All files will be downloaded to 'downloads' dir

real    3m43.758s
user    0m0.814s
sys 0m0.734s

I've found there's almost no network activity on the second iteration and, this confirms:

skip-same works before the download and not after, so it cannot be compared based on hash.

Originally posted by @iyear in https://github.com/iyear/tdl/issues/75#issuecomment-1371035655

Also, I have seen removing the -i filter does not improve things at all; In fact, I see some jpg being downloaded here and there (in occurence order, I guess).

So, I think what's happening here is:

Request: Can we have filters, like: --only: behaves like -i but acts upon "file" propperty on json export --skip-same-name: behaves like --skip-same but acts upon "file" propperty on json export --skip-same-id: behaves like --skip-same-name but acts upon chat+message id

These should avoid high percentage of HEAD requests, thus speeding things up a lot, for some use cases.

Again, thank you for this tool!

iyear commented 9 months ago

The determination logic of --skip-same is based on whether the rendered file name after template rendering exists in the target directory, and template rendering is based on obtaining the message existing on the Telegram server. So, network requests are essential; having only the 'file' field is not sufficient. Template rendering relies on many fields.

--skip-same was born before resumable download, so theoretically, now you only need to use resumable download to quickly resume to the previous state. In other words, --skip-same helps you avoid downloading files with the same "filename", while resumable download enables you to resume downloading from the same message source.