jay-tyler / data-petting-zoo

Mapping etymological nameplace patterns for the UK
Other
0 stars 0 forks source link

What's in a Name: A Data Petting Zoo

Most of the names of towns and cities in the UK came before English was the de facto language. They were named at a period in time where there were several groups co-existing on the British Isles and each had a language closer to the Germanic roots of the English language. Each place name is dripping with associations of adjectivial meaning and bindings to cultural groups that no longer distinctly exist. For this reason, a visual exploration of the data is interesting.

It is worth noting that we take a naive approach to the data; everything is matched based on general patterns as opposed to cherry-picking based on careful historical research. We feel our approach is interesting and generally valid, but should not be taken as canonical. There will be names grouped into associations of meaning that don't belong in those associations. This data is more about visualizing trends, and very good data regarding any one particular placename can be obtained from one of the fine scholastic sources listed below.

Description of engine API

Setup functions

The engine for data-petting-zoo has several setup functions that enable setting up the pandas dataframe from scratch. This is useful for filtering tweaks, etc.

These functions include:

This function reads the GB.txt dataset from file_path and returns a dataframe after applying filters for adm1 regions and geofeatures.

This function applies the regexes from name_rules and establishes an 'ls_namefam' column which contains a Python list containing 'namekey's for applying to the row. np.nan is used to designate rows without membership to any name family.

Query functions

slist is a list contained within the dataframe, patlist is any iterable containing valid regex strings. This function searches the slist for any regex match defined by the iterable patlist. Handling for np.nan is provided (returning None).

string is any python string object and patlist is any iterable containing valid regex strings. This function searches string for any regex match definied by the iterable patlist. Handling for np.nan is provided (returning None).

DataFrame input can be any dataframe that contains an 'ls_namefam' column; namekey should be one of the namekeys defined as a key in name_rules. The returned DataFrame will be a sub-DataFrame containing rows with instances of namekey in 'ls_namefam.' None is returned as a placename for consistency with other APIs.

Input should be any dataframe that contains 'ls_namefame' and 'name' columns, and placestring is user input (or otherwise) that putatively corresponds to an actual named place in the UK; query_placename will attempt to find a place (or something close) and will return the corresponding sub-DataFrame corresponding to the namefamily as well as a namekey indicating that family.

Attempts to execute query_placename() with arguments provided. If this fails, returns a DataFrame consisting of a single place match with a namekey of None and the placename match. If both queries fail, returns None.

Links to Related Resources

Licensing

On D3

Primary article for walking through shapefile-to-GeoJSON conversion

and SVG map creation:

On the general update pattern:

Simple Slider

Things Involving Histograms/Bar Charts