Closed kiegel closed 5 years ago
The statements in field 362 have been created using different rules over time and are quite varied. No algorithm is going to split them into bf:firstIssue and bf:lastIssue elements with complete success until artificial intelligence can be used to imitate human analysis. However, I have examined a number of cases and think it is possible to do a better job than the current version.
Field 362 has two values for the first indicator. Value 1 (unformatted note) is easily handled with bf:note, as is currently done by the converter. Value 0 (formatted style) is the problem and everything below applies to this indicator value.
Instead of splitting at the first hyphen, I suggest using an xsl:choose element and testing for a number of cases. This approach can deal with multiple hyphens, providing better support for internationalization by reducing errors in foreign languages such as Arabic and CJK. The first three cases described below handle special situations, the next four handle hyphens in various positions, and the final one is for everything left over. I am not a professional programmer so I cannot supply finished code, but I have included the tests I used in my analysis. I did not examine 880 fields: if they follow the same patterns they may be okay.
No Hyphen
Some 362 fields have first indicator 0 and an unformatted note (probably should not happen but it does). These go in bf:note, like first indicator 1
362 0_ |a Began in 1989 (OCLC #700325835)
Addressed in commit a99dc139e9669628297af492569de87cc8840754, to be included in v1.4.0.
Field 362 (Dates of Publication and/or Sequential Designation) contains beginning and/or ending designations for serial issues. This is a well known problem, but splitting field 362 into beginning and ending dates at the first hyphen leads to bad results.
For example
362 0_ |a al-Sanah 1., al-ʻadad 1. (Kānūn al-Thānī 1953)-al-sanah 60, al-ʻadad kharīf 2012.
becomes:
bf:firstIssue "al" bf:lastIssue "Sanah 1., al-ʻadad 1. (Kānūn al-Thānī 1953)-al-sanah 60, al-ʻadad kharīf 2012"
There is no obvious solution for an algorithm that would split field 362 correctly. Perhaps for converted records BIBFRAME needs a property representing start/end dates together, in other words, an option for a description that is not as fine-grained as firstIssue/lastIssue.