Open MartinHinz opened 3 years ago
Should be fixed in v2.4.2
Thanks! Same is true for palmisano. Will report other errors as they appear.
Palmisano is probably superseded by aida (#144) soon. But keep them coming anyway. And feel free to make PRs right away.
Hm - ok - palmisano is not superseded by aida.
So what is the correct encoding of the radiocarbon.csv file in the palmisano db zip archive?
file -i radiocarbon.csv
radiocarbon.csv: text/plain; charset=unknown-8bit
That's not helpful.
Not many site names are obviously affected. I only see two, so we could overwrite them manually.
Grotta dell�۪Orso
Osteria dell�۪Osa Necropolis
A lot of the citation strings are heavily broken, though.
I have tried 228 Encodings, none of them worked for the first line source (Skeates). I assume, it is simply gibberish?
The following did not work:
[1] "437" "850" "852" "855"
[5] "857" "860" "861" "862"
[9] "863" "865" "866" "869"
[13] "ARMSCII-8" "ATARI" "ATARIST" "CP-GR"
[17] "CP-IS" "CP1046" "CP1124" "CP1125"
[21] "CP1129" "CP1133" "CP1163" "CP1250"
[25] "CP1251" "CP1252" "CP1254" "CP1256"
[29] "CP1257" "CP1258" "CP154" "CP437"
[33] "CP737" "CP775" "CP819" "CP850"
[37] "CP852" "CP853" "CP855" "CP857"
[41] "CP858" "CP860" "CP861" "CP862"
[45] "CP863" "CP864" "CP865" "CP866"
[49] "CP869" "CP922" "CP932" "CP943"
[53] "CSHPROMAN8" "CSIBM1163" "CSIBM855" "CSIBM857"
[57] "CSIBM860" "CSIBM861" "CSIBM863" "CSIBM864"
[61] "CSIBM865" "CSIBM866" "CSIBM869" "CSISOLATIN1"
[65] "CSISOLATIN2" "CSISOLATIN3" "CSISOLATIN4" "CSISOLATIN5"
[69] "CSISOLATIN6" "CSISOLATINCYRILLIC" "CSKOI8R" "CSMACINTOSH"
[73] "CSPC775BALTIC" "CSPC850MULTILINGUAL" "CSPC862LATINHEBREW" "CSPC8CODEPAGE437"
[77] "CSPCP852" "CSPTCP154" "CSSHIFTJIS" "CSVISCII"
[81] "CYRILLIC" "CYRILLIC-ASIAN" "GEORGIAN-ACADEMY" "GEORGIAN-PS"
[85] "HP-ROMAN8" "HZ" "HZ-GB-2312" "IBM-1163"
[89] "IBM-CP1133" "IBM1163" "IBM437" "IBM775"
[93] "IBM819" "IBM850" "IBM852" "IBM855"
[97] "IBM857" "IBM860" "IBM861" "IBM862"
[101] "IBM863" "IBM864" "IBM865" "IBM866"
[105] "IBM869" "ISO_8859-1" "ISO_8859-1:1987" "ISO_8859-10"
[109] "ISO_8859-10:1992" "ISO_8859-13" "ISO_8859-14" "ISO_8859-14:1998"
[113] "ISO_8859-15" "ISO_8859-15:1998" "ISO_8859-16" "ISO_8859-16:2001"
[117] "ISO_8859-2" "ISO_8859-2:1987" "ISO_8859-3" "ISO_8859-3:1988"
[121] "ISO_8859-4" "ISO_8859-4:1988" "ISO_8859-5" "ISO_8859-5:1988"
[125] "ISO_8859-9" "ISO_8859-9:1989" "ISO-8859-1" "ISO-8859-10"
[129] "ISO-8859-13" "ISO-8859-14" "ISO-8859-15" "ISO-8859-16"
[133] "ISO-8859-2" "ISO-8859-3" "ISO-8859-4" "ISO-8859-5"
[137] "ISO-8859-9" "ISO-CELTIC" "ISO-IR-100" "ISO-IR-101"
[141] "ISO-IR-109" "ISO-IR-110" "ISO-IR-144" "ISO-IR-148"
[145] "ISO-IR-157" "ISO-IR-179" "ISO-IR-199" "ISO-IR-203"
[149] "ISO-IR-226" "ISO8859-1" "ISO8859-10" "ISO8859-13"
[153] "ISO8859-14" "ISO8859-15" "ISO8859-16" "ISO8859-2"
[157] "ISO8859-3" "ISO8859-4" "ISO8859-5" "ISO8859-9"
[161] "JAVA" "KOI8-R" "KOI8-RU" "KOI8-T"
[165] "KOI8-U" "L1" "L10" "L2"
[169] "L3" "L4" "L5" "L6"
[173] "L7" "L8" "LATIN-9" "LATIN1"
[177] "LATIN10" "LATIN2" "LATIN3" "LATIN4"
[181] "LATIN5" "LATIN6" "LATIN7" "LATIN8"
[185] "MAC" "MACCENTRALEUROPE" "MACCROATIAN" "MACCYRILLIC"
[189] "MACGREEK" "MACHEBREW" "MACICELAND" "MACINTOSH"
[193] "MACROMAN" "MACROMANIA" "MACTHAI" "MACTURKISH"
[197] "MACUKRAINE" "MS_KANJI" "MS-ANSI" "MS-ARAB"
[201] "MS-CYRL" "MS-EE" "MS-TURK" "MULELAO-1"
[205] "NEXTSTEP" "PT154" "PTCP154" "R8"
[209] "RISCOS-LATIN1" "ROMAN8" "SHIFT_JIS" "SHIFT_JISX0213"
[213] "SHIFT-JIS" "SJIS" "TCVN" "TCVN-5712"
[217] "TCVN5712-1" "TCVN5712-1:1993" "VISCII" "VISCII1.1-1"
[221] "WINBALTRIM" "WINDOWS-1250" "WINDOWS-1251" "WINDOWS-1252"
[225] "WINDOWS-1254" "WINDOWS-1256" "WINDOWS-1257" "WINDOWS-1258"
You're my man, Martin! Impressive dedication! Let's ask the creator of this database then.
Hey, @apalmisano82, sorry for summoning you once again to this repository. We have some trouble with your dataset "Regional Demographic Trends and Settlement Patterns in Central Italy: Archaeological Sites and Radiocarbon Dates". So far we assumed this data to be UTF-8 encoded, but this does not seem to be right. We're getting a lot of broken symbols, especially in the literature column. Martin now tried a ton of other possible encodings, but none of them match.
As always: Thanks for your help!
This leads to "wrong" site names.