rue-a / hofzuweisungslisten

https://doi.org/10.5281/zenodo.13789643
Other
0 stars 0 forks source link

Polish names and head counts #2

Open rue-a opened 2 months ago

rue-a commented 2 months ago

We did not finish the curation of the polish names and headcounts yet. This is mostly due to the fact that this information is not as structured as the rest of the document.

The polish names and head counts are clogged up into the same column. Often times there are multiple names and headcounts entered into the column, which then also stretch into the next column, Bemerkung. Also, the names are often abbreviated and sometimes there are (apparently, we can not say this for certain) only family names present.

Currently, we simply separate names that are present in the column by a space or if we can detect combinations of given name and family name (e.g., by assuming that abbreviated names are given names), we separate these pairs by a comma. Numbers within the original column are extracted and written into a separate field, where they are separated by space if multiple numbers were present.