vaidap / library

DS4D Project on Encyclopaedia Britannica
0 stars 1 forks source link

Having an initial look at data #1

Open vaidap opened 4 years ago

vaidap commented 4 years ago

Advice from Alexander:

The next steps would probably be:

vaidap commented 4 years ago

Advice from Sarah

vaidap commented 4 years ago

s/f letters misspellings

idea

1) question: measure change 2) method: how do we measure change? 3) what do we learn from the data from this method? "giraffe" is stable word can select subset of words, and clean just those.

example vizs:

https://fathom.info/traces/ (tableau?) -> shows additions/deletions, and growth http://notabilia.net/ https://www.c82.net/work/?id=347

size of xml documents: headers -> look at length of words in section, index across editions and compare what exists across editions. build dictionary of keys, values (name, length, content): compare dictionary keys. check if word is mostly capitalised rather than full capitalised.

vaidap commented 4 years ago

I have an exploratory.py in the github that loads a text file into an ipython notebook!