Open Armand1 opened 4 years ago
Description of the dataset.
325 Athenian Black Figure vases divided into 34 "species" designated by numbers and 17 "genera" designated by a traditional shape, for example, amphora_1. 3-20 individual vases per species.
In principle, the vases of each species are more similar in shape to each other than the vases among species. In practice, this is not true.
cup_14 and cup_15 are little master band and little master lip cups respectively. It's unclear whether, in fact, they have different shapes.
lekythos_24 and lekythos-25 are a mixture of slender and dumpy lethykoi due to a mistake on my part.
This is what they look like
Data pre-processing
Half-vase (open) contours were derived automatically, mostly using handle-chopping algorithms. For some vases, particularly pelikes and loutrophorai, handles were chopped manually since handle-chopping algorithms failed to chop nicely.
The initial contours were (smoothed?); fitted with a "b-spline'' and 70 points were obtained. These processed open contours were reflected to give full or closed contours. These 70x2 point closed contours were used in most subsequent analyses.
This is what they look like:
SRVFs: elastic square-root velocity curves
Contours are converted to SRVFs which removes translation, rotation and size differences. [stuff is done? global registration?] A distance matrix is obtained among the SRVFs. HCA (WardD2) on this dm produces the following. There is a clear distinction between an "amphora" family and a "cup" family, with alabastrons and pyxis being off by themselves. But within these species the grouping is poor.
Another, less sensitive way, of assaying grouping integrity is by asking whether the closest match of each species is a congeneric. For the 70x2 point closed dataset these range between 60% and 100%. (Pelikes and Kalathos are particularly poorly classified).
For genera with multiple species we can do the same but for conspecifics. Here the closest match is a conspecific between 25% and 100% . The greatest confusion is between cup_11 and cup_12 (cup a and droop cup); oddly cup_14 and cup_15 (little master lip and band) are well differentiated. lekythos_25 and lekythos_26 also seem reasonably well differentiated (oddly)
Arianna did the same analysis for "original open curves", "70 point open curves", "original closed curves" and "70x2 point closed curves". Of these the "70x2 point closed curves" dataset worked best.
Norman did an eigenshape analysis on the 70x2 point closed curves dataset.
Stephen did the same: for comparison:
They look pretty similar.
Norman did a Canonical Variates Analysis on the 70x2 point closed curves dataset. This is an old kind of supervised ML that attempts to find derived variables that maximize the difference among a priori defined groups.
He then jackknifed in order to test the robustness of species assignments. This is a plot of his confusion matrix. It's clear that he can assign nearly all species with a high accuracy (most are 100%). Some are confused: cup_11, cup_12, cup_13, cup_14 are all more or less confused with each other; cup_17 and cup_18 are too. Pelikes are confused, to some degree, with amphora_9. There is also some confusion between some amphora classes.
Embedding SRVF distances into linear space
Stephen "embedded" Arianna's SRVFs distances into linear dimensions in several ways. This was to test their use in phylogeny. The methods are: 2d embedding; 3d embedding (I think these are Multidimensional Scaling); "currents" and "monomial".
Here are some plots of those
2d embedding MDS
3d embedding MDS
currents
monomial
I do not like currents and monomial since they place very different shapes close to each other. But we will work with 2d and 3d MDS embeddings of the SRVF distances
We are proliferating results on this dataset. Please report them here so that we can keep track of all this work.