Closed teovin closed 3 months ago
Attention: Patch coverage is 13.25301%
with 72 lines
in your changes are missing coverage. Please review.
Project coverage is 70.48%. Comparing base (
efa78c0
) to head (fbe925c
). Report is 7 commits behind head on develop.
Files | Patch % | Lines |
---|---|---|
perma_web/perma/celery_tasks.py | 17.85% | 46 Missing :warning: |
perma_web/tasks/dev.py | 0.00% | 25 Missing :warning: |
perma_web/perma/models.py | 50.00% | 1 Missing :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Really cool, @teovin 🏄 !!
This is the first draft of the WARC to WACZ conversion experiment.
We have a sample of 1000 WARC files that I uploaded to my local minio instance using Ben's command here.
Task will get all the WARC names from the CSV file, get the corresponding WARC from storage, convert the file, and upload the resulting WACZ file to storage. It will also log the conversion duration, status, file size, and error log (if any) in a csv file.
Or optionally, the task can accept a single WARC argument and only process that one file.
Sample invocation:
docker compose exec web invoke dev.benchmark-wacz-conversion --source-csv='perma/wacz_experiment/1000-a-guids.csv' --benchmark-log='perma/wacz_experiment/benchmark.csv'
or
docker compose exec web invoke dev.benchmark-wacz-conversion --single-warc='A276-A9A4.warc.gz' --benchmark-log='perma/wacz_experiment/benchmark.csv'
I also replayed a few of them using Becky's replay changes here.