relaton / relaton-render

Gem to render ISO 690 XML serialisation into HTML
BSD 2-Clause "Simplified" License
0 stars 0 forks source link

Implement Japanese bibliography rendering style #52

Open ronaldtse opened 3 weeks ago

ronaldtse commented 3 weeks ago

In https://github.com/relaton/relaton-plateau/issues/15 the PLATEAU pubid has removed the Japanese characters for edition.

For JIS and PLATEAU, we need to implement the Japanese bibliography rendering style. The style referenced by JIS Z 8301:2019 is JIS X 0807 based on ISO 690.

Screenshot 2024-08-22 at 5 30 17 PM
opoudjis commented 3 weeks ago

Right now the JIS biblio style does exist, it's just severely underspecified. relaton-render will need updates to deal with things like ordinals for editions with a number in the middle. But I need real information about what is expected, in English, and not in flyspeck Kanji screengrab.

opoudjis commented 2 weeks ago

JIS X 0807 is not a stylesheet, but an ontology, and an unadventurous copy-paste of ISO 690-2:1997, without any specialisation for Japanese. It is absurd to say that this specifies a Japanese bibliography style at all: it is the ISO 690 default, with English-only examples (!!!!!!), and as such, it is already supported in relaton-render by default. At most, we may need to revert from ISO house styling to default ISO 690 styling; but if I'm going to do so, I won't on the basis of this spec.

JIS Z 8301:2019 may have usable information for us. JIS X 0807 does not.

This is the information out of it—all of it stuff we've already done.

Electronic books, databases and computer programs

A part of an electronic book, a part of a database, or a part of a computer program

An article of an electronic book, an article of a database or an article of a computer program

Electronic periodicals

Serial articles and other articles

Electronic bulletin boards, e-mail conferences and e-mail systems

Email

opoudjis commented 2 weeks ago

Next task, since this was not useful: look at the styling of JIS Z 8301:2019.

opoudjis commented 2 weeks ago

For starters, the bizarre requirement to underline (dotted underline, no less) citations where the JIS standard has been updated from the canonical ISO adoption standard, or is other than the canonical ISO adoption standard, or is the canonical ISO adoption standard ignored in favour of the original ISO standard — will be ignored. That is some sort of semantic annotation of JIS references that will need to be worked out and resolved if someone ever asks for this.

... What... a strange thing to do.


JIS Z 8301:2019 is no more help than JIS X 0807: clause 21.5's examples of citations are also English, in their entirety.

Do Japanese standards never cite anything in Japanese?

What I need is a House Style; I need the Japanese counterpart to the Chicago Manual of Style, I need an overview of relevant Japanese grammar, and I need someone who speaks Japanese to do QA. Which I presume is @reeseplews.

The query particularly that was posed up top, that we don't have "5th edn" included in Plateau identifiers any more, is not going to be addressed in JIS Z 8301: (1) edition numbers are NOT included in ISO or JIS standards under ISO or JIS; and (2) JIS Z 8301 is not where I'm going to find out that the Japanese for "5th edn." is 第5版 (or の第 5 版, I guess, in "X, 5th edn." Especially when all their examples are in English anyway.

Never mind that I'm having to guess what is going on via Google Translate. This is not the professional way of going about this.

This is useless. The useful thing to do is:

  1. I implement 第%版 in relaton-render (right now it's English "{{ var1 | ordinal_num: 'edition', '' }} edition", because I had no idea what it was.
  2. I introduce editions into Plateau (not JIS) rendering of standards.
  3. I generate a bunch of (preferably Japanese language) references in a range of bibliographic types, with the existing ISO-inherited stylesheet, and Reese can tell me what looks wrong.
opoudjis commented 2 weeks ago

FWIW, the default rendering of a monograph that I test on in relaton-render,

RAMSEY, J. K. and W. C. MCGREW. Object play in great apes: Studies in nature and captivity. In: PELLEGRINI, Anthony D. and Peter Kenneth SMITH (eds.): The nature of play: Great apes and humans [electronic resource, 8vo]. 3rd edition. New York, NY: Guilford Press. 2005. pp. 89–112. https://eprints.soton.ac.uk/338791/. [viewed: September 3, 2019].

is now:

RAMSEY, J. K. と W. C. MCGREW. Object play in great apes: Studies in nature and captivity. PELLEGRINI, Anthony D. と Peter Kenneth SMITH (編): The nature of play: Great apes and humans [electronic resource, 8vo]. 第3版. New York, NY: Guilford Press. 2005. ページ89–112. https://eprints.soton.ac.uk/338791/. [見た: 2019年9月3日].

@ReesePlews how wrong is that?

opoudjis commented 2 weeks ago

The JIS representation of standards right now is:

standard: "{% if home_standard %}<span_class='stddocTitle'>{{ title }}</span> ,_{{ extent }}{% else %}{{ creatornames }}. <span_class='stddocTitle'>{{ title }}</span> ,_{{ extent }} .  {{ labels['version'] | capitalize }}_{{ edition_raw }}. {{labels['updated'] | capitalize }}_{{date_updated}}. {{status | capitalize}}. {{ authorizer }}. {{ labels['availablefrom'] }}:_<span_class='biburl'>{{ uri }}</span>.{% endif %}"

To spell out:

JIS, ISO: TITLE, PART (optional)

Other Standards: AUTHORS. TITLE, PART (optional). Version EDITION. Updated UPDATE-DATE. STATUS. SPONSOR. Available from: URI.

Version, Updated, Available from can be in Japanese.

So, with Japanese i18n, IETF RFC 3979 is displayed as:

S. Bradner. Intellectual Property Rights in IETF Technology. RFC Series. 入手可能: https://www.rfc-editor.org/info/rfc3979

What I'm getting here is a suggestion that Plateau documents, in Plateau, have optional edition numbers. So presumably,

JIS, ISO, Plateau: TITLE EDITION PART (optional)

Where EDITION will be the newly introduced 第5版 in Japanese, and "5th edn." in English.

So, "PLATEAU Handbook #02 1.0" right now (as a document other than JIS and ISO) renders as:

PLATEAU Handbook #02 1.0, 3D都市モデル標準作業手順書. Version 1.0. 国土交通省. 入手可能: https://www.mlit.go.jp/plateau/file/libraries/doc/plateau_doc_0002_ver01.pdf.

I'm assuming what you want, @ronaldtse , in your cryptically phrased ticket, is

PLATEAU Handbook #02 1.0, 3D都市モデル標準作業手順書. 第1.0版. 国土交通省. 入手可能: https://www.mlit.go.jp/plateau/file/libraries/doc/plateau_doc_0002_ver01.pdf.

Is that what you want? Do you want just the title as with ISO/JIS, so

PLATEAU Handbook #02 1.0, 3D都市モデル標準作業手順書. 第1.0版.

... or what?

opoudjis commented 2 weeks ago

Italics are still happening in titles, and they shouldn't be.

opoudjis commented 2 weeks ago

Needed to tell Plateau explicitly to use JIS relaton-render class.

ronaldtse commented 2 weeks ago

@reeseplews will also supplement with some guidelines for Japanese bibliography.

ronaldtse commented 1 week ago

@opoudjis indeed, JIS X 0807 is an adoption of the old ISO 690-2, and doesn't seem to provide Japanese bibliography rules.

I extracted two relevant examples from the JIS Z 8301 bibliography. I hope this is sufficient for our implementation.

Example 1

現代仮名遣い 昭和 61.7.1 内閣告示第 1 号
 入手先 [オンライン 2018.07.12 閲覧]:
 http://www.mext.go.jp/b_menu/hakusho/nc/k19860701001/k19860701001.html

This means:

{title: 現代仮名遣い} {date: 昭和 61.7.1} {subtitle: 内閣告示第 1 号}
 {source: 入手先} [{online: オンライン} {2018-07-12: 2018.07.12} {read: 閲覧}]:
 http://www.mext.go.jp/b_menu/hakusho/nc/k19860701001/k19860701001.html

The symbol is entered by alt + square bracket on the Japanese keyboard.

Example 2

文部科学省用字用語例 平成 23.3
 入手先:新訂 公用文の書き表し方の基準(資料集),文化庁編集,第一法規,2011,pp. 313-346.
{article title: 文部科学省用字用語例} {date: 平成 23.3}
 {source: 入手先}: {book title: 新訂 公用文の書き表し方の基準(資料集)},{collection title: 文化庁編集},{publisher: 第一法規},{date of publication: 2011},{page numbers: pp. 313-346}.
ReesePlews commented 1 week ago

hello @opoudjis thanks for examining this issue. i know bibliographic research takes places here but i dont know what documents exist that describe these. i have not seen a "Japanese like - Manual of Style" or similar document. perhaps the National Diet Library could provide some references. we could also ask JSA what the status is of any updates for JIS X0807 harmonizing with ISO 690:2021.

for our work here, i did some searching and found that the Japan Science and Technology Agency (JST) had a project called SIST from the mid-2000s thru mid-2010s that was tasked to develop "The Standards for Information for Science and Technology (SIST) are standards designed to facilitate the distribution of scientific and technical information. the project has concluded and the website is archived on the government archive server (WARP). From the WARP server we can access all of the "standard" documents and some other materials.

SIST 02 along with a PDF for download describes the preparation of bibliographic/citation references. i also happened to find this github site but the links were based on the original site (no the WARP site now), however they could be searched using the SIST 02 html manual pages. the github site also had some english information.

from the SIST 02 document we can pickup some styling guidelines.

regarding the earlier examples that you have prepared here are some comments (based on SIST 02)

[見た: 2019年9月3日] in your example should be changed to [参照: 2019年9月3日] (even though you may receive a translation of "see also" or "reference" it is the correct notation for "accessed or viewed". 見た comes up as a correct translation of "viewed" but it is not correct for written documents.

some of your examples use 入手可能 which can mean "available". that japanese could be changed to something from SIST02 such 入手日付 (date of access) .

use of " ページ" is not used for these type of references. "p." or "pp." are well understood.

use of "と" for "and" in a list of authors... the meaning is correct but i dont think i have seen that used in a list of authors, english or japanese. perhaps the "と" can be left out but if the English model requires that "and" is used in a list of authors we could keep it in and ask the Plateau Team members.

use of "編" for "Editor, Ed." is acceptable. it is often used in Japanese references.

use of " 第5版" for "5th edition" or something similar is a correct translation and it would be used in Japanese references. i think it could be kept here, unless it is difficult to manage.

i see @ronaldtse has provided some new information while i have been preparing this answer. after you have it ready, i will ask the Plateau Team for feedback. thank you.

ReesePlews commented 1 week ago

@ronaldtse and i were checking the 文化庁 site a while back and i thought they had some guidelines on the creation of "written documents". i could not find that link today. Ron do you remember that?

ronaldtse commented 1 week ago

The guidelines for official documents from the Bunka-cho does not specify anything about the bibliography, and itself uses QR codes and links for references 😓

it is here: https://www.bunka.go.jp/seisaku/bunkashingikai/kokugo/hokoku/pdf/93651301_01.pdf

ronaldtse commented 1 week ago

@ReesePlews I checked SIST 02. It does contain many bibliographic reference examples, but they are mostly using ISO 690 style bibliographic presentation with English half-width punctuation. The guidelines actually cite the JIS adoption of ISO 690 and ISO 690 itself.

@opoudjis and I did include some Japanese bibliographic practices in the latest ISO 690 as examples, but they are insufficient. So there is a future opportunity to do so...

opoudjis commented 1 week ago

Punctuation should be i18n'd in the output, but it looks like this is only being done for Chinese, not Japanese. Punctuation i18n involves full-width equivalences for Roman punctuation.

This is what we do for Chinese. Can I confirm that the same needs to be done for Japanese, or should I do different substitutions? Left-hand Roman, right-hand CJK.

: :
, ,
. .
) )
] ]
: :
; ;
? ?
! !
– ~ # en-dash
( (
[ [
opoudjis commented 1 week ago

Removing と for "and"; using "," instead (so as to keep the author_and templates common with JA)

Using "p, pp" instead of ページ

The date accessed is given as 見た currently. I am changing it to 参照

opoudjis commented 1 week ago

@opoudjis indeed, JIS X 0807 is an adoption of the old ISO 690-2, and doesn't seem to provide Japanese bibliography rules.

I extracted two relevant examples from the JIS Z 8301 bibliography. I hope this is sufficient for our implementation.

Example 1

現代仮名遣い 昭和 61.7.1 内閣告示第 1 号
 入手先 [オンライン 2018.07.12 閲覧]:
 http://www.mext.go.jp/b_menu/hakusho/nc/k19860701001/k19860701001.html

This means:

{title: 現代仮名遣い} {date: 昭和 61.7.1} {subtitle: 内閣告示第 1 号}
 {source: 入手先} [{online: オンライン} {2018-07-12: 2018.07.12} {read: 閲覧}]:
 http://www.mext.go.jp/b_menu/hakusho/nc/k19860701001/k19860701001.html

using Google Translate:

Modern Kana Spelling, July 1, 1986, Cabinet Notification No. 1. Available from [online, accessed July 12, 2018]:

So: { title } { date } [and I'm not converting to regnal years, that should be done in the source] { series } Available from [online {date-accessed} read : url ]

.... And this gives me the answer to the question I had, what is "available from": 入手先 ... 閲覧

This means I have to change

_{{ labels['availablefrom'] }}:_<span_class='biburl'>{{ uri }}</span>. [{{ labels['viewed'] }}:_{{date_accessed}}].

to

{% if uri %}{{ labels['availablefrom'] }} [{{ labels['online'] }} {% if date_accessed %}{{ viewed }}{% endif %} {{ uri }}]{% endif %}

Japanese 

availablefrom = 入手先
online = オンライン
viewed = {{ var1 }} 閲覧 # populated in code with date_accessed

English

availablefrom = Available from
online = online
viewed = viewed {{ var1 }} 

Available from [online {date-accessed} read : url ]

The symbol is entered by alt + square bracket on the Japanese keyboard.

Query on punctuation localisation posted above.

Example 2

文部科学省用字用語例 平成 23.3
 入手先:新訂 公用文の書き表し方の基準(資料集),文化庁編集,第一法規,2011,pp. 313-346.
{article title: 文部科学省用字用語例} {date: 平成 23.3}
 {source: 入手先}: {book title: 新訂 公用文の書き表し方の基準(資料集)},{collection title: 文化庁編集},{publisher: 第一法規},{date of publication: 2011},{page numbers: pp. 313-346}.

Ministry of Education, Culture, Sports, Science and Technology, Examples of Characters and Terminology, March 2011. Available from: Revised Standards for Writing Official Documents (Collection of Materials), edited by the Agency for Cultural Affairs, Dai-Ichi Hoki, 2011, pp. 313-346.

So 入手先 is "in" presumably for a book chapter.

From these examples, there is no full stop between title, date, and series, no putting series in brackets, and there is a comma between publisher and page numbers.

These examples are confusing. Why is the date repeated in Example 2? If 文化庁編集 is a series, why is "Edited by the Agency for Cultural Affairs"? Is that an editor indication instead?

I am going to get something done as best as I can make sense of this information, and if this doesn't produce acceptable output, I am going to require precise explanation as to why. These are programmatically generated references, and they need programmatic structure.

ReesePlews commented 1 week ago

hello @opoudjis thank you for the additional investigation on this. i will send these samples to the client and see if i can get some feedback today. the National Diet Library has these pages on "bibliographic data", the translation and browsing works well with chrome. along with other information on digital library projects. additionally some translations of various metadata documents and such. there is a "standardization" contact address there at the bottom of the page. perhaps @ronaldtse, as one of the editors for ISO 690, can reach out to them and see if they know of open information or an actual Japanese style reference for bibliographic entries.

opoudjis commented 1 week ago

This is going to miss the release, which is today, but I don't think this is an hour's work anyway.

What I would particularly like is examples not of report citations, which are odd in how they are presented because of how ad hoc their publication and metadata is; but of more established formats, that I can extrapolate from: books, book chapters, journal articles.

I am concerned that there is idiosyncrasy in the Japanese bibliographies: we'll add a day-month-date here but not there, a regnal year here but a Gregorian date there, we will (maybe?) conflate publishers and editors. Reports are rife with that kind of uncertainty, but I will not be implementing uncertainty. These references are generated by template: they will not be hand crafted. So I really do need prototypical, not atypical references, to work out what the structure is. (And journal articles are more prototypical than reports.)

I will sift through what you've sent after I do the release, but... I need consolidated and straightforward advice on bibliographic formatting, preferably with exemplars, preferably not involving reports, and preferably from the client. I'm sure there is a house style at the client, particularly around the use of punctuation in references.

I do also need confirmation on full-width punctuation for Japanese @ronaldtse @ReesePlews , see https://github.com/relaton/relaton-render/issues/52#issuecomment-2323294690

ReesePlews commented 1 week ago

i think it is a difficult call to make with the double-byte punctuation. we would never want to use a double byte "comma" between non-japanese authors, even if that is japanese bib entry. we may find a kanji comma "、" but i think that would be unbalanced too. the kanji comma "、" is more used in the actual body text of the document. since the bib entries are not considered "body text" i think they would be using single byte punctuation; but that is just my comment.

what about the SIST 02 document? i think we may have better luck finding style guides for scientific publications more than government documents. from my limited experience with government report preparation, the main emphasis is on the type of sections included, rather than the style. in my experiences, the end users are strict about having specific "common named" sections so the documents are easy to navigate rather than worrying so much about the styles... they will advise on fonts, style of non japanese words (single instead of double byte), etc.

SIST 02 along with a PDF for download describes the preparation of bibliographic/citation references. i also happened to find this github site but the links were based on the original site (no the WARP site now), however they could be searched using the SIST 02 html manual pages. the github site also had some english information.

opoudjis commented 1 week ago

The thing to do with bits of Roman text interspersed with CJK text is, to apply CJK localisation of punctuation only if the punctuation has CJK text either side. That will address the concern about commas misapplied between Western authors; but you are saying that even in Japanese citations, there is a reluctance to use full-width punctuation. That surprises me, but that looks like a two tier punctuation localisation: no localisation of punctuation proper, only of brackets, when they contain CJK text.

ronaldtse commented 1 week ago

@opoudjis I think in a CJK context, only localized punctuation should be used. There is no logic to use non-localized punctuation. I will make sure this happens in the next 690. The current specs that use half width punctuation and full width punctuation inconsistently is due to 690 itself not offering logical advice.

ReesePlews commented 1 week ago

i was concerned you might have been thinking that the following double-byte characters ( ,and .) were to be substituted for single byte (, and .) when an english entry would be shown in japanese. we would not want that type of punctuation substitution, in my understanding.

i have inquired with the client, awaiting a reply.

opoudjis commented 1 week ago

So, a CJK context does need to be defined as a preceding character (for most characters) or a following character (for opening brackets) being CJK. If the character is Roman, leave it alone.

That has not been implemented yet, the implementation currently globally localises punctuation for a language, no matter what specific characters precede it. That, @ronaldtse, is a refinement of the notion of CJK context, so it still complies with what you want. And that, @ReesePlews, fixes the issue you're worried about.

I'm awaiting feedback on this one. Brackets are clearly going to be localised, I'm waiting to hear what the client expects to happen with punctuation.

ronaldtse commented 1 week ago

@opoudjis I'm not convinced we need to have a special "CJK context" because it means there will be a "lang context" (C/J/K punctuation conventions, even for brackets, are different).

I do not quite see a need for this, and need to be convinced.

I think a Japanese bibliography entry is just that, a Japanese language bibliography entry that uses Japanese conventions. If there are English etc inside it, we will still apply the Japanese style rules.

opoudjis commented 1 week ago

In other words, you think

Johnson、 A。、 Peters、 B。 1976。 The origins of sound 【series】。 London〯Blackwells。

is absolutely correct rendering in a Japanese or Chinese bibliography.

We really don't inhabit the same universe any more, as I've repeatedly have been finding recently, but FWIW, that's what is currently going to happen if I tell relaton-render to follow CJK punctuation for Japanese.

I await the response from MLIT with interest.

ReesePlews commented 1 week ago

Johnson、 A。、 Peters、 B。 1976。 The origins of sound 【series】。 London〯Blackwells。

ouch! that will not work. i have never seen any entry like that; and just to clarify (because i had problems with rendering in an earlier reply) this is what i am seeing (as an image) of this line above.

image

lets wait and see what the client comes back with.

ReesePlews commented 1 week ago

additionally, in my understanding, these 【 】 are not a replacement for simple double-byte [ ] . i dont see these 【 】 used that often in technical writing. have you found a source that says they are commonly used?

ReesePlews commented 1 week ago

hello @opoudjis i have discussed the bibliographic style rules with the client this morning. do you have a simple text list of the english terms you want to have translated into japanese. if you can show me the file, or list, i will work on that with the client. from memory they did not have a reference they could tell me where these rules were defined, so i think they will check some sources they are familiar with and come up with rules we can use. diving deeper into that may require consulting with local ISO TC 46 experts or the NDL. thank you.

opoudjis commented 1 week ago

I'll get you the list, but I'm more interested in the formatting: we've addressed most of the translations already.

What I'm going to do is generate some sample references in Japanese, illustrating how the templates work (multiple authors, multiple bibliographic styles), and I'll request proofreading comments on them. Putting something in front of them is how we're going to get comments. I do see some reasonable-looking citations in Japanese in http://fbennett.github.io/sist02/ , thank you for that.

And I'm going to put fullwidth punctuation in Roman text in front of them, because that's something I want cauterised quickly—this:

I think a Japanese bibliography entry is just that, a Japanese language bibliography entry that uses Japanese conventions. If there are English etc inside it, we will still apply the Japanese style rules.

leads to this:

image

And that's already happening now in Chinese.

I'm busy with another task, but I'll try to get something together tonight.

ronaldtse commented 1 week ago

Please hold off this task for the moment, I found some really good bibliographies in Japanese that will shed light on this topic, and also possibly 690.

ReesePlews commented 1 week ago

thanks @opoudjis and @ronaldtse. sounds like a good plan. i received a document from the client but i think it is a bit too simplified for our requirements. i look forward to the discussion at your convenience.

i will say though on that image (re-pasted here), it looks really strange to me. but we can discuss later. 363831356-d5ccaf39-a689-40c6-a16e-ef3dfd53b563

thanks!

opoudjis commented 1 week ago

On hold until Ronald produces his samples, but SIST02 are already adequate to the requirement.

ReesePlews commented 1 week ago

hi @opoudjis thanks for checking the SIST02 document.