mozilla-releng / buildhub2

A database index of buildhub.json files built by Mozilla
https://buildhub2.readthedocs.io/
Mozilla Public License 2.0
4 stars 13 forks source link

No Nightly build listed since November 30 #1029

Closed pascalchevrel closed 11 months ago

pascalchevrel commented 1 year ago

I am on yesterday's build 20221201161829 (https://archive.mozilla.org/pub/firefox/nightly/2022/12/)

but https://buildhub.moz.tools/?channel[0]=nightly&products[0]=firefox only lists November 30, it seems that either December 1 was skipped or that it stopped indexing after November 30.

JohanLorenzo commented 1 year ago

Hey @pascalchevrel! I'm not very knowledgeable about buildhub2 so I don't know how the data gets ingested and how much time it takes.

That said, I just had a look at your link and I now see the December 1st builds 😄:

image

Do you still get the old data?

pascalchevrel commented 1 year ago

You screenshot shows November 30 builds

JohanLorenzo commented 1 year ago

Oh sorry, I looked at the published column instead of the build ID one! You're totally right!

JohanLorenzo commented 1 year ago

I read the documentation[1] to understand how everything is linked together. Yesterday, we had a single nightly[2] (instead of the usual 2). Buildhub consumes the *.buildhub.json files that are uploaded onto archive.mozilla.org. Yesterday's nightly does contain such files and they seem valid to me. So the problem is further in the data ingestion pipeline.

I don't have access to the AWS account that emits the SQS events. So I can't check whether they worked yesterday or not. The next thing in the pipeline is Buildhub's SQS consumer daemon script. I wanted to check its logs but I don't have enough permissions to do so:

image

That said, I see Firefox 108.0b9 was correctly ingested by buildhub and its buildID is ~3 hours posterior to the missing nightly. Hence, we know the daemon is running. Although maybe it had a hiccup at the time of the nightly. Or maybe the nightly contains some data that couldn't be managed by buildhub.

image

The next thing in the pipeline is the ElasticSearch database and the buildhub website is just a frontend to it. We can literally submit raw ElasticSearch requests. Thus, we know the missing nightly is not on the database.

I'm not sure what I can do next. Maybe someone in SRE can help to get more info from AWS SQA or GCP logs. @cvalaas, who would be the right person to loop in?

[1] https://buildhub2.readthedocs.io/en/latest/architecture.html [2] https://archive.mozilla.org/pub/firefox/nightly/2022/12/2022-12-01-16-18-29-mozilla-central/

cvalaas commented 1 year ago

Based on my quick reading of the docs you linked in [1], it's possible that taskcluster writes only to S3 (ie- not to GCS, which is the new authoritative storage for productdelivery) and when buildhub goes to look for that file on archive.m.o it can't find it. The S3 bucket syncs hourly to GCS, but that's probably not quick enough in this case. If this is the problem, we have two things that need fixing:

  1. Taskcluster should be writing to GCS (as well as, or instead of, S3).
  2. Buildhub will need to listen to an (as yet uncreated) Google Pub/Sub queue for file create notifications instead of SQS

I don't know the effort involved in 1 or 2 (someone on taskcluster eng will need to look into that, I expect), but it's not super trivial. As a quick "fix" (I use that word very loosely), perhaps we could build a delay into buildhub's file retrieval of 1 hour?

Does this all sound likely? Should we set up a meeting (with more people) to hash this out?

cvalaas commented 1 year ago

(Also, we could potentially point buildhub to the S3 bucket itself (as a temporary measure) for downloading, instead of archive.m.o)

cvalaas commented 1 year ago

Sounds like buildhub2 is currently (SRE-)owned by @smarnach so pinging him here.

jcristau commented 11 months ago

@pascalchevrel It looks like later builds did make it to buildhub. Are we good to close this or are there builds we need to backfill still?

pascalchevrel commented 11 months ago

I think we can close it, thanks!