pulibrary / oclcpinyin

2 stars 0 forks source link

Place names: continents, states, and others #2

Closed pan-zhuo closed 6 days ago

pan-zhuo commented 1 month ago

According to section 2I of the LC romanization table, generic continent terms should be separated and capitalized.

In the place dictionary https://github.com/pulibrary/oclcpinyin/blob/main/src/places.txt

545 非洲 Feizhou --> Fei Zhou 719 南极洲 Nanjizhou --> Nanji Zhou 720 南極洲 Nanjizhou --> Nanji Zhou 738 欧洲 Ouzhou --> Ou Zhou 739 歐洲 Ouzhou --> Ou Zhou 808 亚洲 Yazhou --> Ya Zhou 809 亞洲 Yazhou --> Ya Zhou

Consider adding the following continents: 美洲 Mei Zhou 拉丁美洲 Lading Mei Zhou 大洋洲 Dayang Zhou

Generic terms for US states (section 2F of the LC romanization table): 410 愛荷華州 Aihehua zhou --> Aihehua Zhou 411 爱荷华州 Aihehua zhou --> Aihehua Zhou 417 阿肯色州 Akensezhou --> Akense Zhou 607 加州 Jiazhou --> Jia Zhou

Delete "城建" and correct pinyin: 902 鄂尔多斯市城建 E'ersuosi Shi --> E'erduosi Shi 903 鄂爾多斯市城建 E'ersuosi Shi --> E'erduosi Shi Consider adding: 鄂尔多斯 E'erduosi 鄂爾多斯 E'erduosi

As an aside, I'm wondering what source data is used for place names? Is it possible to add official administrative division tables such as 2023年中华人民共和国县以上行政区划代码 or 2023年统计用区划代码和城乡划分代码?

tventimi commented 1 month ago


Thanks so much for this helpful feedback! I am consulting with our cataloging staff about these suggestions but will try to incorporate as much as possible into the place names dictionary. I will get back to you soon.

Tom Ventimiglia Princeton University Library

tventimi commented 4 weeks ago

Regarding the administrative tables you cited, I do not see any romanization in these charts. Was your suggestion just to use these lists as a source of terms to be included as place names that would require special formatting? (Even though the specifics of each term would need to be determined?)

pan-zhuo commented 4 weeks ago

Yes. I haven't found a free Chinese place name dictionary with pinyin, but if such a resource exists, it would be preferable. The perfect dictionary is probably one that contains all historical and foreign names but that's just my dream.

As for the administrative tables, I am considering using an automated process to add romanization for all names in there, followed by a human review. I saw pretty promising results from ChatGPT below. Given the tables are huge already, some challenges that I could see: a) The time-consuming nature of the entire process: name extraction, conversion, and review... It could become a project in itself. b) The impact on the macro's performance: Will this noticeably slow down the macro's execution speed?

ChatGPT 4o results:

Certainly! Here are the Chinese administrative divisions romanized using the ALA-LC Chinese romanization table in the requested format:

普洱市 Pu'er Shi

思茅区 Simao Qu 宁洱哈尼族彝族自治县 Ning'er Hanizu Yizu Zizhixian 墨江哈尼族自治县 Mojiang Hanizu Zizhixian 景东彝族自治县 Jingdong Yizu Zizhixian 景谷傣族彝族自治县 Jinggu Daizu Yizu Zizhixian 镇沅彝族哈尼族拉祜族自治县 Zhenyuan Yizu Hanizu Lahuzu Zizhixian 江城哈尼族彝族自治县 Jiangcheng Hanizu Yizu Zizhixian 孟连傣族拉祜族佤族自治县 Menglian Daizu Lahuzu Wazu Zizhixian 澜沧拉祜族自治县 Lancang Lahuzu Zizhixian 西盟佤族自治县 Ximeng Wazu Zizhixian

Thank you for the reply! Zhuo

tventimi commented 6 days ago

Hello Zhuo,

A new version of the OCLC pinyin macro (2.1.3) was released today, and it includes the specific geographic names you listed in your original message. Thank you for your feedback and your contributions to the quality of this tool!

Using AI to convert the administrative lists is an intriguing idea. Definitely worth looking into. Thanks for the suggestion.
