This background job downloads a requested file from GCS, OCRs it if necessary (i.e., if it's not plaintext already, using Tesseract/Leptonica), then submits the plaintext to indexing (after which the plaintext can be discarded.)
Checkpoint progress (downloading, OCRing, indexing) should be reported via job infra that's yet to be defined.
This background job downloads a requested file from GCS, OCRs it if necessary (i.e., if it's not plaintext already, using Tesseract/Leptonica), then submits the plaintext to indexing (after which the plaintext can be discarded.)
Checkpoint progress (downloading, OCRing, indexing) should be reported via job infra that's yet to be defined.