Open gsnedders opened 5 years ago
Yes, this is most likely the case. If the build is not present it will fail. But that is not a problem with mozdownload.
It seems like a problem with mozdownload if we are asking "give me the latest linux nightly", it's finding a directory where a linux nightly could be, and because it hasn't been uploaded yet, failing rather than going back to the previous nightly. If we are just using the API wrong that's fine (and suggestions/patches welcome), but otherwise this seems like a real issue.
Specifically, we're doing:
from mozdownload import FactoryScraper
scraper = FactoryScraper("daily",
branch="mozilla-central",
version="latest",
destination="browsers/nightly")
filename = scraper.download()
At least to me, that looks like it shouldn't ever fail because the build isn't present: either there's some bug on the server side which causes mozdownload
to think a build exists, or there's a bug in mozdownload
not handling some bit of server behaviour.
Oh, I see. You don't use the buildid to specify a particular fixed version. In that case this would need some further investigation. I assume you don't have way to run mozdownload with -vv
to get more verbose logging output?
So @jgraham's reply would make sense. Looks like we don't traverse back into older folders until a version has been found. I wished that we would already have Taskcluster support. Maybe we should raise its severity to make things like that easier.
How often do you hit that?
On a timescale of days i.e. a few times a week. It's often enough that the behaviour is problematic.
So the code clearly finds the build status file of the latest build. But I wonder if new files first get populated in latest-mozilla-central
before an appropriate folder by date is created under the month folder. That is the only thing I could imagine happening here.
Does it help if you add the --retry-attempts
, and maybe also the --retry-sleeptime
arguments? As I can see you don't make use of that feature yet, and it might help here.
I'm not sure if this is something we can fix by using archive.mozilla.org
if the upload of builds is done in a wrong order. Ideally we should use TaskCluster to download the builds, but that hasn't been started yet. See issue #365.
I assumed that the problem is the latest-mozilla-central
link being updated when the first artifact is available for the new set of builds, not the last artifact. So if e.g. there's a linux32-debug build ready but not linux64-opt then latest-mozilla-central
will temporarily point at a folder with no suitable build. In that case there's no fallback to just using the latest build that is ready.
Looking at http://archive.mozilla.org/pub/firefox/nightly/latest-mozilla-central/ I see a time gap of at least an hour between the first artifact and the last one (ignoring the non-67 builds), so this at least seems plausible. Given that, retries would help but the total timeout would have to be prohibitively long to avoid hitting this problem.
I don't think that we completely replace this folder. In such a case there wouldn't be such old Firefox 66 nightly builds present, as what we currently have.
What mozdownload actually does is to check for the status file like: http://archive.mozilla.org/pub/firefox/nightly/latest-mozilla-central/firefox-67.0a1.en-US.linux-x86_64.txt
Currently it lists 20190130215539
as build id. Based on that information we are trying to download the files from https://archive.mozilla.org/pub/firefox/nightly/2019/01/2019-01-30-21-55-39-mozilla-central/. But if the files aren't present there it will fail.
So I really wonder if we maybe first update the links in the latest
folder before actually adding/updating the build id specific folder.
@nthomas-mozilla who from RelEng could explain us how the latest-mozilla-central
folder and the specific build id folder are getting populated?
You're right that the latest-mozilla-central
directory is appended to rather than recreated (paired with an expiration policy to clean out older builds). Files are moved into that dir by a beetmover task, here's an example log for a recent linux64 nightly.
The artifacts are handled asynchronously, copying them first into the dated directory then to latest-mozilla-central
, and the .txt file is started before the actual tar.bz for Firefox. In that particular log there's only a 10 second gap but longer is possible depending on network conditions. Another complication is that there is up to 14400 seconds (4h) of caching on the latest directory on archive.m.o, but I'm not sure how that would lead to files not found. More likely would be getting the previous nightly via stale copy of the .txt file.
In terms of alternatives
latest-mozilla-central
dir and is subject to the caching mentioned aboveWould it be possible for this issue to be assigned to someone? It's coming up in https://github.com/web-platform-tests/wpt/issues/13274 with some regularity, requiring manual intervention each time as we can't tell the difference between mozdownload failures and other types of failures.
@foolip I replied on the wpt issue with the 2nd part from @nthomas-mozilla reply. If that doesn't work, maybe increase the retry attempts and delays for mozdownload for now.
@nthomas-mozilla is there a way to prevent using the cache and always get a fresh copy? We currently try to do it via https://github.com/mozilla/mozdownload/blob/master/mozdownload/scraper.py#L425, but that might be wrong?
I don't think the caching should be a problem because as noted there's no suggested mechanism for the cached file to not exist, whereas it looks like we are getting a pointer to a file that doesn't yet exist.
@whimboo AFAIK there's no cache busting that can be done. I had a look through the mozdownload source, and after retrieving the build date it seems get_build_info_for_date()
will then try to parse the directory listing at firefox/nightly/YYYY/MM/
to make sure YYYY-MM-DD-hh-mm-ss
is present. The listing also cached, for 15 minutes by looking at the headers. For this particular use case I suggest just scraping firefox/nightly/YYYY/MM/YYYY-MM-DD-hh-mm-ss/
directly.
From https://github.com/web-platform-tests/wpt/issues/13274:
These are stacks like:
From @jgraham in https://github.com/web-platform-tests/wpt/issues/13274#issuecomment-427981381: