wbond / package_control

The Sublime Text package manager
https://packagecontrol.io
4.79k stars 815 forks source link

Use cached database on running "Install" command #1645

Closed eugenesvk closed 1 year ago

eugenesvk commented 1 year ago

Update: there seems to be 2 things that can be implemented to reduce the needless delay:

Every time I run an Install Package command, the Package Control takes some time (I guess to update the database) I'd like to avoid that step and always use the cached version to avoid any annoying delays Then the update should happen periodically in the background (I don't need to get the latest on every single run) Or at least it should be delayed until you actually install something

deathaxe commented 1 year ago

Package Control uses 2 levels of caching channel/repository information.

A fully evaluated list of packages is hold in RAM for the time specified by "cache_length" setting (by default 5mins).

After that, channel information are re-fetched from packagecontrol.io preferring locally cached data stored on disk, if 304 is returned by the server. The delay of displaying quick panel is mainly defined by the server response time then - no downloads are taking place.

_You can increase cache_length, but be aware nothing will be fetched during this period - no channel data and no upstream infos from git tracked packages._

Then the update should happen periodically in the background

It is not planned to perform periodic tasks in the background beyound initial auto-update and package maintanance upon startup. Installing/Updating packages is not a prio 1 task, which happens every 10 minutes during normal work, thus periodically fetching package information is not justified.

Or at least it should be delayed until you actually install something

Fetching channel data (list of packages) is a crucial part of displaying a list of installable packages and can't be delayed as described. This request just doesn't make sense.

eugenesvk commented 1 year ago

You can increase cache_length, but be aware nothing will be fetched during this period - no channel data and no upstream infos from git tracked packages.

I'd like to set this to ∞, but then I don't need to waste that memory forever for the rare use of install command, I guess what I'm asking for is to be able to skip the server query stage completely and always use the locally cached data - then I'll get the list without any unneeded delay

not planned to perform periodic tasks in the background beyound initial auto-update and package maintanance upon startup

that's fine, I don't need it more frequently, updating the local database along with the auto_upgrade_frequency is ok. I'll just set it to 0 in those rare cases I'd need to make sure to get the latest package

deathaxe commented 1 year ago

Actually RAM cache isn't actively cleared as doing so doesn't have any effect on plugin_hosts RAM usage (checked on Windows). Thus packag list resides in RAM until next package operation (instantiating of PackageManager) anyway.

To check that yourself just call:

>>> from package_control import cache
>>> cache.clear_cache()

The http cache on the other hand is an implementation detail of the downloader. It has an option to prefer locally cached files without sending a request, but it would

a) cause updated channel/repo to be fetched not before http_cache_length (1 week) is expired, not receiving any update in the meanwhile. b) not make any difference compared to increasing cache_length.

After 5 mins, PC would just delete RAM cache to immediatelly rebuild it from the same http download artefact without checking whether it has been updated upstream. That's more or less pointless.

eugenesvk commented 1 year ago

Just did a quick test:

Don't understand how it squares with your description (at least as how I understand it) that at least within the first 5mins everything should be instant. Maybe it's only caching the default package database?

Using the latest beta version https://github.com/wbond/package_control/releases/tag/4.0.0-beta4

deathaxe commented 1 year ago

Available packages are fetched and cached by PackageManager.fetch_available(). It doesn't matter whether they are fetched via a channel or a repository. The method merges all sources into one dictionary of "repo_url": data pairs and adds them to RAM cache.

Faild sources are indeed cached in RAM as well to avoid requerying dead ends.

Custom repositories may increase delay of displaying quick panels if cache is cold, especially if they contain unresolved package/release information. Those require at least 2 API calls per package to Github/Bitbutcket/Gitlab to fetch details and tag info, which may take a significant amount of time (see: https://github.com/wbond/package_control/issues/1638). That's however nothing PC can improve/solve at this point.

What default channel (packagecontrol.io) basically does is to periodically crawl all repositories, do the expensive API calls and stores resolved package information in the channel_v3.json, which PC can download and use directly.

eugenesvk commented 1 year ago

That's however nothing PC can improve/solve at this point

PC could just skip those calls? Is there a combination of settings that would just not make those calls untill the regular auto_upgrade_frequency upgrade/update maintenance task is run?

Custom repositories may increase delay of displaying quick panels if cache is cold

But it happens even on immediate repeats as mentioned above, tried it again with commenting out custom repositories, and the speed has improved

deathaxe commented 1 year ago

PC could just skip those calls?

If an empty "Install Package" List is good for you, then yes.

But it happens even on immediate repeats as mentioned above,

I can't reproduce that with default settings, neither with default channel, a custom repository.json or a direct Github repo url in repsositories setting.

Turning on "debug": true in settings displays

Package Control: Fetching list of available packages and libraries
  Platform: windows-x64
  Sublime Text Version: 4150
  Package Control Version: 4.0.0-beta4

There's no indication of any connection being established anywhere, once repositories have been fetched, until the time of cache_length is ellapsed.

Note: Setting cache_length: 0 however completely disables caching information, which even causes the cached package information downloaded from default channel not to be used. In that case, PC starts crawling each repository, at each call of Install Package.

That's likely to fail due to rate limits.

eugenesvk commented 1 year ago

Meanwhile did another test:

so maybe the issue is with beta 4? Though I can't test it since in Normal mode v3 fails with the crypto isssue for which you already have a few issues opened, and I can't manually copy v4 to the safe mode as it gets auto-deleted Or maybe safe mode does something special (tried with "fresh" Installed Packages Packages folders, but still safe mode works, but regular doesn't)

If an empty "Install Package" List is good for you, then yes.

:) It won't be empty, I mean repeated calls, not the first one

Thanks to your debug command found out that this link "https://raw.githubusercontent.com/eugenesvk/sublime-dic_RuEn_bi/main/repository.json", gets requested every single time Strange! Will remove it for now. Other repositories seem to get cached

(I have "cache_length": 300)

eugenesvk commented 1 year ago

And thanks for helping out!

deathaxe commented 1 year ago

It appears the reason is an issue with resolving commit timestamps causing a hidden exception while downloading infos from a code hoster. Needs some investigation.