openSUSE / MirrorCache

Download Redirector
https://opensuse.github.io/MirrorCache/
GNU General Public License v2.0
37 stars 12 forks source link

Looking for an efficient way to monitor for download folder changes #349

Open okurz opened 1 year ago

okurz commented 1 year ago

Motivation

I reported https://progress.opensuse.org/issues/123797 about a problem that a pipeline that was monitoring the URL http://download.opensuse.org/repositories/GNOME:/Medias/images/iso/?P=GNOME_Next* is now always reporting changes even though no new files where showing up in that folder.

the content always changes as the generated HTML page shows a "csrf-token" changing on each call. I found that by calling

diff <(curl -sS "http://download.opensuse.org/repositories/GNOME:/Medias/images/iso/?P=GNOME_Next*") <(curl -sS "http://download.opensuse.org/repositories/GNOME:/Medias/images/iso/?P=GNOME_Next*")

which yields:

<       <meta name="csrf-token" content="3a72819665c9adb750ad4d5e8054961c5b1c3efc" />
---
>       <meta name="csrf-token" content="0b17cbf73c393a1baa4daee989158038caeafc7c" />

As there is also no "last-modified" served in HEAD of those documents and also not when looking into files themselves like

curl --head "http://download.opensuse.org/repositories/GNOME:/Medias/images/iso/GNOME_Next.x86_64.iso.sha256"

I wonder what is the best approach to look for changes in a folder and trigger external services accordingly. Right now the best approach I found is to download the checksum file http://download.opensuse.org/repositories/GNOME:/Medias/images/iso/GNOME_Next.x86_64.iso.sha256 recurringly and check for changes in content

Suggestions

andrii-suse commented 1 year ago

Maybe using json output will resolve the issue? https://download.opensuse.org/repositories/GNOME:/Medias/images/iso/?P=GNOME_Next*&json

okurz commented 1 year ago

I wasn't aware how to find JSON routes. This looks helpful however the content is bigger than the checksum file so it would be less efficient to fetch that and compare in content. Also the "last-modified" approach would be more efficient

andrii-suse commented 1 year ago

So what last-modified should refer to when using pattern or regex for file name? The max mtime of files that match the pattern / regex?

okurz commented 1 year ago

So what last-modified should refer to when using pattern or regex for file name? The max mtime of files that match the pattern / regex?

I guess it could simply be the same "last-modified" of the complete page regardless of the filtering. So it would be like the "minimum mtime". The filtering would be applied on top.

andrii-suse commented 1 year ago

But then it will not be usable for monitoring for changes with file filter, because a change in the mtime may be related to some other files. Do I get it right?

andrii-suse commented 1 year ago

Another way may be to check for header response. For now it works only for mirrorcache.o.o , but I probably need to fix it for download.o.o as well the same way:

curl -Is https://mirrorcache.opensuse.org/repositories/GNOME:/Medias/images/iso/GNOME_Next.x86_64.iso | grep location
location: /repositories/GNOME:/Medias/images/iso/GNOME_Next.x86_64-43.2-Build23.9.iso
andrii-suse commented 1 year ago

So far it works only for *Current.iso, but I can add support for the same logic for GNOME*.iso , e.g.

curl -sI https://download.opensuse.org/tumbleweed/iso/openSUSE-Tumbleweed-DVD-x86_64-Current.iso | grep -i location
location: https://download.opensuse.org/tumbleweed/iso/openSUSE-Tumbleweed-DVD-x86_64-Snapshot20230129-Media.iso
andrii-suse commented 1 year ago

FYI GNOME_Next now redirects to particular build - this may be good way to track changes:

curl -Is https://download.opensuse.org/repositories/GNOME:/Medias/images/iso/GNOME_Next.x86_64.iso | grep -I location
location: /repositories/GNOME:/Medias/images/iso/GNOME_Next.x86_64-43.2-Build23.34.iso

We can close the ticket or let it wait til there will be priority to provide something like mtime in response header.

okurz commented 1 year ago

That looks good. But I don't think I can instruct jenkins to read just the head and the always-changing "csrf-token" as described in the original description still seems to be problematic.

andrii-suse commented 6 months ago

I was working on a similar issue and investigating adding etag and x-media-version headers:

# curl -I http://download.opensuse.org/repositories/GNOME:/Medias/images/iso/?P=GNOME_Next.x86_64*Build*.iso
HTTP/1.1 200 OK
etag: 1-664C81BE
x-media-version: 29.64

Sad thing that it doesn't work properly if many files match the mask, but it should be easy to fix. But the question: will such (properly working) headers be enough or do you still prefer to have last-modified?

andrii-suse commented 6 months ago

also :

 curl -Is https://download.opensuse.org/repositories/GNOME:/Medias/images/iso/GNOME_Next.x86_64.iso
HTTP/2 302 
etag: 664C81BE-64310000
x-media-version: 29.64
andrii-suse commented 6 months ago

also now:

# curl -I http://download.opensuse.org/repositories/GNOME:/Medias/images/iso/?P=GNOME_Next*
etag: 9-664C9DC0
x-media-version: 29.65
okurz commented 6 months ago

But the question: will such (properly working) headers be enough or do you still prefer to have last-modified? As I stated the goal is that jenkins can handle a way to monitor download folders to decide if jenkins builds should be triggered.

andrii-suse commented 1 month ago

In my understanding jenkins can monitor etag or x-media-version response headers, so I am closing the call.

andrii-suse commented 1 month ago

On second thought both last-modified and csrf-token are needed as well, so I will change it to feature request