Open bernhard-da opened 2 years ago
Hi @bernhard-da,
although the logs may look like there is some kind of problem, Flexo works as intended here: Notice that the messages saying xxx is not available
and xxx was unavailable at all remote mirrors
only appear for those files that end with .db.sig, but .db
files are served just fine. Flexo does not find db.sig
files because they are simply not available at the remote mirror. Have a look at this thread where one of the Arch Linux maintainers explains:
Because the databases are not signed yet. The process for doing that is still being worked out...
So, the current status (even if you don't use Flexo) is that Pacman requests those files, receives a 404 response and then just silently ignores the response.
As I have quite a large number of internal clients the traffic (e.g from community.db) adds up over time.
Files ending with .db
are another story: Flexo serves the .db
files, but it does not cache them. This is intentional, and it cannot be changed at this moment. If Flexo would cache database files like normal files, then clients would eventually receive outdated database files. Of course, one could implement some special caching logic for database files and only cache them for a configurable duration (e.g., so you can configure Flexo to serve the database from cache if the cached version is not more than one hour old). But I decided against this because I found that the benefit does not justify the added complexity. The community.db
file is currently just ~ 6 MB, so I never saw an issue in downloading this file a couple of times.
May I ask how fast your internet connection is? Did you notice this behavior because pacman was slow to download the database files, or did you notice this just by inspecting Flexo's logs?
hi @nroi
thx a lot for your detailled answer; indeed I was not really wondering about the .sig files but the the [CACHE MISS]
for the .db files;
your explanation does make perfect sense. to answer your question:
May I ask how fast your internet connection is?
Did you notice this behavior because pacman was slow to download the database files, or did you notice this just by inspecting Flexo's logs?
yes, i have a unreliable internet-connection which is often slow too (max around 20mbit down) and also my isp throttles speeds after a specific amount of downloaded data; so i realized that pacman was slow (on many clients) downloading the same .db files and I also monitored the (total) size of downloaded .db files was quite high.
i have a unreliable internet-connection which is often slow too (max around 20mbit down) and also my isp throttles speeds after a specific amount of downloaded data; so i realized that pacman was slow (on many clients) downloading the same .db files and I also monitored the (total) size of downloaded .db files was quite high.
I see. I guess there are other users with similar issues. In that case, I might reconsider if it makes sense to implement some caching mechanism for database files. This should probably be disabled by default, and it should be configurable to determine the duration after which locally stored database files are considered stale and redownloaded again.
But don't expect this to be implemented very soon, I'm currently prioritizing changes that improve the code-maintainability over new features.
@nroi fair enough. thx again for your comments and working on flexo :)
I also see an opportunity of improvement here. Maybe it make sense to check how pacman handles this, because, when I don't use flexo, database files are cached somehow.
sudo pacman -Sy
:: Synchronizing package databases...
core is up to date
extra is up to date
community is up to date
multilib is up to date
But when I use flexo, the database files are always being downloaded.
I can't check how pacman works right now, but I'll try to figure this out later.
@Zebradil Thanks for pointing this out. pacman sends the If-Modified-Since
header, for example:
If-Modified-Since: Sun, 30 Jan 2022 10:17:26 GMT
Which means that the mirror may respond with a 304 Not Modified
instead of sending the entire payload.
The timestamp seems to be set according to the Modify
or Change
timestamp of the file in /var/lib/pacman/sync
. If you run sudo touch -m /var/lib/pacman/sync/core.db
, then pacman sends a new If-Modified-Since
timestamp.
It makes sense for flexo to behave like pacman, so this is something that should change in flexo.
This post is intended to summarize all information required to implement this feature, as well as information about what value this feature adds to Flexo.
Database files are currently not cached. With a large number of clients, this can add up in traffic. This is relevant especially for users with a slow internet connection or an ISP that throttles speed after a given amount of data has been downloaded (see also: https://github.com/nroi/flexo/issues/82#issuecomment-974785049).
Originally, it was not planned to implement any kind of caching for database files to avoid that Flexo serves any outdated files. However, it turns out that it should actually be possible to implement some kind of caching:
Consider the case when pacman is used without Flexo. When pacman requests a database file, then it sends the If-Modified-Since
header. The remote mirror then either serves this file as usual if the database file on the remote mirror is more recent than the header, or it just returns 304 Not Modified
no more up-to-date file is available.
We therefore aim to implement something comparable for Flexo: If a new database file is available at the remote mirror, then Flexo should always serve this file instead of a stale, cached version. On the other hand, if Flexo already has the database file in a version that is more recent or just as recent as the version on the remote mirror, then no new download from a remote mirror should be required.
If-Modified-Since
when requesting database files. The value of this header should be the Modify
or Change
timestamp of the database file (need to find out which one pacman uses).304
, we just assume that the locally cached version is not stale, and serve this one to the requesting client.2xx
, then we overwrite the locally stored version with the payload served by the remote mirror.
hi @nroi , first off, thx a lot for providing flexo. it is a really great and very useful piece of software!
i have however experienced one issue; I am working on a fully updated arch-system with the following flexo.toml in which I changed the path of the cache-directory to /storage/...
flexo.toml
flexo is serving cached packages for all clients in my lan works flawlessly. however, i see the following entries in the server-log for all enabled repos when I do a
pacman -Syu
on a client.log
i have tried with different mirrors but I cannot manage that also the databases are provided from flexo. As I have quite a large number of internal clients the traffic (e.g from
community.db
) adds up over time. Do I have to set a specific config-setting to make this work or do you have an idea where I could start looking?