web-archive-group / WALK

Web Archives for Longitudinal Knowledge
8 stars 2 forks source link

Renaming Collections to Reflect Institution #35

Closed ianmilligan1 closed 8 years ago

ianmilligan1 commented 8 years ago

Now that we've got multiple institutions that we're ingesting into WALK, we should rename the directories in data to reflect the original institution.

ianmilligan1 commented 8 years ago

I will do the first batch, save the Labour collection that is currently ingesting (will do so when it's settled).

ALBERTA_coll = University of Alberta TORONTO_coll = University of Toronto WAHR_coll = Our research group, internally generated data

i.e. TORONTO_Canadian_Labour Unions

ianmilligan1 commented 8 years ago

OK! That was some fun times. I also cleaned up all the derivative folders (i.e. /data/derivatives/links/ so that all outputs also follow this naming convention. All PART files were discarded, just the combined data files.

A random piece of code that I've used, just so I don't forget it elsewhere:

for f in *; do mv "$f" "ALBERTA_$f"; done
greebie commented 8 years ago

Another script for later purposes. It removes [text] from the files in the folder files (similar prefixes cause problems when I truncate the filenames).

$extra = [text]

for filename in *.fasta; do [ -f "$filename" ] || continue mv $filename ${filename//$extra/}

done