suurjaak / Skyperious

Skype chat history tool
Other
350 stars 36 forks source link

Caching media files. #112

Closed ghost closed 1 year ago

ghost commented 1 year ago

I am exporting my database to html fairly often (in fact, every night, by cron), and exporting seems to take a lot of time, most of which is, seemingly, spent on downloading image data.

Would it be possible to cache that data somewhere? Say, in ~/.cache/skyperious/ ?

suurjaak commented 1 year ago

I suppose this can be added. For example, disabled by default, with a configuration flag to enable it.

However, can you hazard a guess on how much of that time goes on trying to download content that is no longer available? Because that will continue to take time, as failures cannot be cached - might have been a temporary network problem during the download.

ghost commented 1 year ago

Well, failures cannot be cached, I guess, but are there many "expected failures" when exporting? If thought that "in general", all links in the Skype database should link to valid files. So if I am running an export today, some files may fail due to network problems, but they are likely to be available tomorrow, so eventually everything will be cached, and only files that appeared between the two consequetive runs will have to be downloaded anew.

Is there some hidden option to "profile" downloading? See the list of files, download speed, and error status.

suurjaak commented 1 year ago

Regarding expected failures - it depends. Files and audio/video messages are kept in Skype servers up to 30 days. And everything shared before 2017 April is no longer available anyway.

But you are right that as long as the cache is populated periodically, failures should not play much of a role.

There is no hidden option to profile downloading. But if you are using the source code distribution, and are up to a bit of Python hacking, you can add logging calls to SkypeLogin.get_api_content() in live.py on your computer (https://github.com/suurjaak/Skyperious/blob/master/src/skyperious/live.py#L889).

ghost commented 1 year ago

Okay, thank you for the pointers, I will add some profiling wrappers.

suurjaak commented 1 year ago

I'd like your input on whether this caching should be enabled by default or not.

Reasons why enabled:

Reasons why not enabled:

ghost commented 1 year ago

I think that if cache is kept together with the database, say, in ~/.config/skyperious, privacy considerations would be same for both. The cache directory can be marked with the cachedir.tag : https://bford.info/cachedir/, for cleanup programs.

So I would suggest having it on by default.

suurjaak commented 1 year ago

Sorry for the delay, finally got around to releasing v5.4.

Added configuration flag SharedContentUseCache for this, by default false.

ghost commented 1 year ago

Let me test it for a few days and get back with the feedback. Thank you for implementing this!

ghost commented 1 year ago

In version 5.4 the -v and the --version options do not work.

suurjaak commented 1 year ago

Confirmed. In fact, they haven't worked since 5.3, I now discover.

ghost commented 1 year ago

I haven't found problems in a week's time, so I guess this can be closed.

Thank you!