Closed cgreene closed 4 years ago
I know that we switched to letter grouping, but @arielah and I also discussed the alternative of using a different grouping (e.g., the World Bank analytical grouping from rnaturalearth), which is more geography-based but still rather arbitrary. Do you think it would help? @cgreene @dhimmel I'm also okay with leaving it as is.
relates to #38, #45
From my read of the NamePrism paper, the methodology that they used makes more sense than geographic groupings. They construct embeddings based on contact chains. They use these embeddings to find similarities at the country level (see 4.3.2). This evidence seems to support the taxonomy.
At least some of the results that I found initially odd because of their contrast with geography (e.g., Bangladesh not being in SE Asia but instead among those countries with Arabic naming traditions) seem to be at least in some ways backed up by other information around naming traditions: https://en.wikipedia.org/wiki/Bengali_name
Name origins are fundamentally an individual-level property. Because of the limitations of the source data in Wikipedia, the best we can get for modern naming traditions classification is to the country level. For a grouping of countries by naming traditions, I haven't seen anything better than NamePrism. I wouldn't use their names for the groupings, which appear to be arbitrary (i.e., what should probably be called Arabic is called Muslim). However, I haven't see anything better than the groupings themselves from the point of view of the research question around name origins.