Closed makew0rld closed 3 months ago
Result of qaAnalysisFinished
:
{
"orgId": "<org id>",
"itemId": "manual-20240815184407-66652278-30b",
"resources": [
{
"name": "cb8515f9-7622-4879-b79e-d1f084a11ea2/qa/20240815184633123-66652278-30b-0.wacz",
"path": "<download link>",
"hash": "876e73aa2cf1c56e508144fd126d5a9a5f24e98e8640c21656827e2bdead90e0",
"size": 194044,
"crawlId": "manual-20240815184407-66652278-30b",
"numReplicas": 0,
"expireAt": "2024-08-17T06:46:37"
}
],
"state": "complete",
"event": "qaAnalysisFinished",
"qaRunId": "qa-20240815184553-66652278-30b"
}
Result of crawlReviewed
:
{
"orgId": "<org id>",
"itemId": "manual-20240815184407-66652278-30b",
"event": "crawlReviewed",
"reviewStatus": 4,
"reviewStatusLabel": "Good",
"description": "New desc here"
}
Probably we want to only target crawlReviewed
, and only ingest crawls above a certain rating. The rating and description should go into AA. Note description
contains the description of the archive even if it is unchanged during the review process.
Rating / review status levels:
1 - Bad
2 - Poor
3 - Fair
4 - Good
5 - Excellent
I propose we ingest any crawls rated Fair or above.
The other question is how much we want to support NOT using QA and just auto-ingesting any crawls. How often will we see this use case? This could just be a switch in the config file, but that would make it hard to work with if multiple projects are going on at once, and they each want different things.
This all sounds good to me. Thanks for building this!
Do we want to support no-review auto-ingest, that's probably better question for @walkerlj0 @basilesimon @YurkoWasHere we'll take that up on Slack.
WACZ should be (optionally?) processed after QA approval using the new Browsertrix QA action webhooks, not right after a crawl is completed as is done currently.