web-archive-group / WALK

Web Archives for Longitudinal Knowledge
8 stars 2 forks source link

Toronto Transfer #34

Closed ianmilligan1 closed 8 years ago

ianmilligan1 commented 8 years ago

Now that we've got UTL signed off on the MOU, in addition to the CPP collection let's grab.

ianmilligan1 commented 8 years ago

@ruebot kicked off with the Labour Unions download.

ruebot commented 8 years ago

@ianmilligan1 I'm pretty sure we're going to run out of space, and we'll need to contact ComputeCanada to get our second storage allotment. We're at ~3.7T free now, and have ~250 out ~7000 warcs downloaded in the Canadian Government Information collection. If we're going to average about 1G per warc, that collection is going to be around 7T. Let me know if you want me to take the lead on contacting ComputeCanada.

ianmilligan1 commented 8 years ago

Thanks for the heads up @ruebot – let's suspend the Canadian Government DL for now, I reckon.

I'll touch base with Compute Canada tomorrow (I've got an e-mail thread from July about this – #26 - so it'll spur their memory hopefully). Will CC you and we can figure out the best way to do this.

ruebot commented 8 years ago

suspended

ianmilligan1 commented 8 years ago

Now that we've got the storage, do you want to restart Canadian Government DL @ruebot?

I will look into the broken WARC from #37...

ruebot commented 8 years ago

Yeah, I restarted it this morning.

ianmilligan1 commented 8 years ago

Hazzah. Great!

ianmilligan1 commented 8 years ago

Could we add T-Space to this list, @ruebot? I think having an outlier collection like the T-Space one would help us develop comparative tools. It doesn't look that big, as I think it's just a handful of digital projects deposited in TSpace rather than theses, etc.

I've added it above.

ruebot commented 8 years ago

@ianmilligan1 sure!

ruebot commented 8 years ago

Done.

ianmilligan1 commented 8 years ago

Fantastic!