Support Browsertrix QA webhooks

makew0rld commented 3 months ago

WACZ should be (optionally?) processed after QA approval using the new Browsertrix QA action webhooks, not right after a crawl is completed as is done currently.

makew0rld commented 3 months ago

Result of qaAnalysisFinished:

{
  "orgId": "<org id>",
  "itemId": "manual-20240815184407-66652278-30b",
  "resources": [
    {
      "name": "cb8515f9-7622-4879-b79e-d1f084a11ea2/qa/20240815184633123-66652278-30b-0.wacz",
      "path": "<download link>",
      "hash": "876e73aa2cf1c56e508144fd126d5a9a5f24e98e8640c21656827e2bdead90e0",
      "size": 194044,
      "crawlId": "manual-20240815184407-66652278-30b",
      "numReplicas": 0,
      "expireAt": "2024-08-17T06:46:37"
    }
  ],
  "state": "complete",
  "event": "qaAnalysisFinished",
  "qaRunId": "qa-20240815184553-66652278-30b"
}

Result of crawlReviewed:

{
  "orgId": "<org id>",
  "itemId": "manual-20240815184407-66652278-30b",
  "event": "crawlReviewed",
  "reviewStatus": 4,
  "reviewStatusLabel": "Good",
  "description": "New desc here"
}

Probably we want to only target crawlReviewed, and only ingest crawls above a certain rating. The rating and description should go into AA. Note description contains the description of the archive even if it is unchanged during the review process.

Rating / review status levels:

1 - Bad
2 - Poor
3 - Fair
4 - Good
5 - Excellent

I propose we ingest any crawls rated Fair or above.

makew0rld commented 3 months ago

The other question is how much we want to support NOT using QA and just auto-ingesting any crawls. How often will we see this use case? This could just be a switch in the config file, but that would make it hard to work with if multiple projects are going on at once, and they each want different things.

benhylau commented 3 months ago

This all sounds good to me. Thanks for building this!

Do we want to support no-review auto-ingest, that's probably better question for @walkerlj0 @basilesimon @YurkoWasHere we'll take that up on Slack.

makew0rld commented 3 months ago

[x] If the crawl is reviewed after already being ingested, only the crawl rating and possibly updated description should be added, instead of re-ingesting the whole file

starlinglab / integrity-v2

Support Browsertrix QA webhooks #64