Closed rebeccacremona closed 1 year ago
Merging #3238 (e1e075f) into develop (c632594) will decrease coverage by
2.69%
. The diff coverage is9.54%
.
@@ Coverage Diff @@
## develop #3238 +/- ##
===========================================
- Coverage 81.91% 79.22% -2.70%
===========================================
Files 52 53 +1
Lines 5918 6143 +225
===========================================
+ Hits 4848 4867 +19
- Misses 1070 1276 +206
Impacted Files | Coverage Δ | |
---|---|---|
perma_web/perma/tasks.py | 53.28% <6.70%> (-7.17%) |
:arrow_down: |
perma_web/perma/utils.py | 66.33% <10.44%> (-10.93%) |
:arrow_down: |
perma_web/perma/models.py | 90.91% <33.33%> (+<0.01%) |
:arrow_up: |
perma_web/perma/views/user_management.py | 95.18% <50.00%> (+0.69%) |
:arrow_up: |
perma_web/perma/admin.py | 88.10% <60.00%> (-0.42%) |
:arrow_down: |
perma_web/urls.py | 100.00% <0.00%> (ø) |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
Context
We are reorganizing Perma's Internet Archive collection. In the past, we would add an "Item" with a single "File" to the collection for each new Perma Link; going forward, we've been asked to instead create one digest-like Item per day, with Files for each of the Perma Links created on that day.
See this internal document for a complete description of the project; see also its project board.
This PR
There are some 162,082 Perma Links that we would have expected to find uploaded to existing "daily" Internet Archive items, but whose WARCs are not present. For example, https://perma.cc/06N5Qy9rxNE's WARC is not included in https://archive.org/download/daily_perma_cc_2013-11-13.
This PR adds Celery tasks for uploading those links to the appropriate daily item.
It does -NOT- create new items in IA, if the appropriate daily item does not already exist.
This is intended as a gentle way to try out our upload code and test parallelization and the handling of rate-limiting. It will likely require tweaking, once we see how it works under real conditions.
Deploying
There's nothing special required for deployment, though we may want to have another look at all the IA-related settings and make sure we like them. There are now a number of configurable retry limits that could want tweaking.
Before running against any sizeable number of files, we probably want to let IA know that we are resuming uploads, experimentally.
To queue uploads, in the Django shell, run:
where
limit
is the maximum number of links you want to enqueue.