scikit-hep / awkward

Manipulate JSON-like data with NumPy-like idioms.
https://awkward-array.org
BSD 3-Clause "New" or "Revised" License
850 stars 89 forks source link

Intermittent failures in deploying the documentation preview #3305

Open jpivarski opened 1 week ago

jpivarski commented 1 week ago

For example, https://github.com/scikit-hep/awkward/actions/runs/11783736057/job/32997893500?pr=3302

Run aws-actions/configure-aws-credentials@v4
  with:
    aws-region: eu-west-[2](https://github.com/scikit-hep/awkward/actions/runs/11783736057/job/32997893500?pr=3302#step:3:2)
    role-to-assume: arn:aws:iam:::role/
    audience: sts.amazonaws.com
  env:
    X86_64_PYTHON_VERSION: [3](https://github.com/scikit-hep/awkward/actions/runs/11783736057/job/32997893500?pr=3302#step:3:3).11.0
    SOURCE_DATE_EPOCH: 1668811211
    S3_BUCKET: preview.awkward-array.org
    DEPLOY_URL: http://preview.awkward-array.org.s3-website.us-east-1.amazonaws.com
Assuming role with OIDC
Assuming role with OIDC
Assuming role with OIDC
Assuming role with OIDC
Assuming role with OIDC
Assuming role with OIDC
Assuming role with OIDC
Assuming role with OIDC
Assuming role with OIDC
Assuming role with OIDC
Assuming role with OIDC
Assuming role with OIDC
Error: Could not assume role with OIDC: 1 validation error detected: Value 'arn:aws:iam:::role/' at 'roleArn' failed to satisfy constraint: Member must have length greater than or equal to [20](https://github.com/scikit-hep/awkward/actions/runs/11783736057/job/32997893500?pr=3302#step:3:21)

But this doesn't always happen. @henryiii asked why we're not using ReadTheDocs—we no longer have build times that exceed ReadTheDocs's 10 minutes (it's more like 1 minute and we're not adding new compiled code). @agoose77, what was the motivation? (I think this was set up after you made awkward-cpp builds fast.)

agoose77 commented 1 week ago

@jpivarski we don't use RTD because build & execution of tutorials takes a while. Our builds are usually fast on GHA because we have caching of the C++.

I haven't done a clean-cache build + clean-cache execution, but I assume it's still somewhat slow.

I don't know what causes that error, but I think it's the case that OIDC fails to issue a token sometimes. I haven't looked yet. Will try to take a peek tomorrow?

jpivarski commented 1 week ago

Ah, it's those notebooks! Yes, they do take a while to run from the Jupytext to produce all their outputs.

The thing that confuses me about the token is that the error message seems to be saying that it's the wrong format. Perhaps Amazon is sending an error message, which our script tries to interpret as a token.

It's intermittent, but maybe it will still be happening tomorrow. The PR that is triggering this test is just an auto-generated version-updater, so we can trigger that as often as we like.