Open scossu opened 3 weeks ago
On a second look, it looks like I had never implemented this functionality. I had used the MARC field for numeral parsing, but in https://github.com/lcnetdev/scriptshifter/blob/main/tests/data/script_samples/chinese.csv some tests mention fields 100 and 700 are to be handled as names where a comma is added. @tventimi can you provide more details on this logic?
See the following code snippet from Parallelogram:
https://github.com/pulibrary/parallelogram/blob/main/cloudapp/src/app/pinyin.service.ts#L135-L149
This code is run on the romanized version of the name. It is assumed that the name consists of two or three separate "words". The first word is capitalized and followed by a comma. The second word is also capitalized. If there is a third word, it is appended to the second one with no capitalization and no space in between. However, if the third word begins with a vowel, then an apostrophe is placed between the second and third words. Thus,
Luan bao quan --> Luan, Baoquan Wen dao an --> Wen, Dao'an Xia jing --> Xia, Jing Sima qian --> Sima, Qian
Note that in the last example, the surname is multisyllable and corresponds to two Chinese characters (司马). However, the code doesn't need to know this because these characters have already been romanized and written as a single word by the time it reaches this point in the code.
Also note that the code snippet above applies this logic to subfield r of any MARC field, but in such cases, the comma after the first word is omitted.
In the code snippet you linked to, what do the tag
, ind1
, and code
variables represent?
Fixed in #139.
@tventimi can you please test? I only had Sima, Qian in Chinese script to test with.
I might have to adjust the code to select the MARC field. At the moment it only applies to 100, 600, 700, 800.
I tested some more examples and confirmed that the name formatting is correct.
Source string:
欒保羣
Result:
Luan bao qun
Expected result:
Luan, Baoqun
Other examples involving the MARC field option are behaving similarly. This may be a regression from the DB migration.
@tventimi FYI