Open Intelligent2013 opened 4 days ago
very interesting to see the vertical layout. thanks for all the work on this @Intelligent2013 ! i dont work with vertical layout much but the third image above looks more correct than the second image. the layout of the kanji numbers in the first image appears correct for the main clause numbers, but with the sub-clause numbering, the vertical style of '三・一
' etc seems different to me... i guess, in theory, that is the correct style but seems a bit difficult on the eyes; again i dont have enough experience with vertical layout. i suspect that vertical layout is widely used by such agencies as the justice ministry (法務省) and the writing of japanese laws/regulations. i know there is a large legal website that has japanese laws with english translations, but off hand i dont remember the link. they may have samples of printed works online that could be helpful in these cases.
Thank you @ReesePlews ! Yes you are right that the Japanese "e-Gov" website has all the Japanese laws.
For example, this is the Constitution of Japan:
For vertical layout, they have 3 options: 1 column, 2 columns and 4 columns
This is the law that establishes JIS:
For space savings, this is a screenshot of the 4 column (so it's not too tall to show here).
It uses the list style:
The list style only uses a single full width space indentation to separate list levels.
UPDATE: It seems that when Paragraphs are labeled, in the e-Gov website the paragraph label for the first paragraph is omitted, and subsequent paragraph labels exist. Not sure why the list item "1" is missing though. This doesn't seem to be an East Asian tradition.
The 1st post updated - added 'edition number'.
There's two elements to this.
The first is to support Japanese numerals, and I can do that, sure: that's merely 2.localize(:ja).spellout
, using twitter_cldr.
The second is to work out where to use Japanese numerals instead of Arabic numerals. This should not be being done on an ad hoc basis, and it should not be being done independently in HTML and PDF: there needs to be a rule as to where it happens, and it needs to be done in Presentation XML.
I have the bad feeling that this is going to end up as a document attribute.
I have the bad feeling that this is going to end up as a document attribute.
You mean the specification of list bullet styles per level being configurable? I'd (everyone would) love that.
I don't even know if I can do that in HTML. Not without a lot of pain.
And you need to say a lot more about where Japanese numbers are meant to show up. Numbering is done in code; I can make the xref counter output Japanese instead of Arabic numerals, but that means initialising each counter instance in isodoc, one for every block type and clause (figures, tables, requirements, etc etc etc).
Without a coherent statement, you are not getting anything.
Note: I don't know the reason, but the notes numbers should be Arabic:
@Intelligent2013 I just noticed this since @opoudjis raised it. They are meant to be in Japanese numerals too.
You mean the specification of list bullet styles per level being configurable? I'd (everyone would) love that.
PER LEVEL?! No you are not getting random list level specification PER LEVEL. ISO HTML CSS has 30 lines of custom code just to insert ")" after list numbers. https://github.com/metanorma/isodoc/issues/247 has been unactioned for the past four years because of how horrible Word HTML is about custom list numbering.
No, what you're going to get is:
Ordered lists will rely on the Presentation XML feature of //ol/li/@label
to tell the consumer what to put in the list. This will only work out of the box for PDF, and there is code from other flavours that can make it work for DOC; HTML would need CSS overriding to make it work.
I am considering this nothing more than a proof of concept.
I'm going to realise this with the document attribute
:presentation-metadata-japanese-numbering: true
@ronaldtse wants to generalise this to Arabic, Chinese, and Amharic.
I have little inclination to do so, and this does not address the very real problem of what types of block are going to be Arabic and what local.
But:
:presentation-metadata-autonumbering-style: japanese
The nightmare scenario is:
:presentation-metadata-notes-autonumbering-style: arabic
:presentation-metadata-clause-autonumbering-style: japanese
:presentation-metadata-subclause-autonumbering-style: arabic
I will not be implementing that.
To make counters more configurable, I'm going to eventually set up configuration of all counters—starting value and style. But for now, I'm only going to expose that for clauses and lists.
I've got a problem: I want to assign config to counter classes based on config in the xref class (which knows about numbering styles from the Presentation XML metadata), but I don't want to redefine all the classes invoking them.
So to exploit inheritance, I'm going to have to define these counter classes with methods invoked from the xref class.
Not working yet...
Also we need to support Japanese numerals in the publication date. I've updated the initial post.
I am providing Japanese numbering in the Presentation XML, but there is a nightmare scenario where you provide Japanese numbering for page numbers. If you do need them, and if XSL:FO is not clever enough to do that automatically, I'll need to dump the numbers 1–1,000 in the localization strings. Let's not action that yet though... I'd be surprised if XSL:FO doesn't provide that natively somewhere.
I am providing Japanese numbering in the Presentation XML, but there is a nightmare scenario where you provide Japanese numbering for page numbers. If you do need them, and if XSL:FO is not clever enough to do that automatically, I'll need to dump the numbers 1–1,000 in the localization strings. Let's not action that yet though... I'd be surprised if XSL:FO doesn't provide that natively somewhere.
@opoudjis Apache FOP has the extension fox:number-conversion-features
(https://xmlgraphics.apache.org/fop/2.0/complexscripts.html#source), but looks like it's not working at all, may be I try something wrong... For any case, let's dump the numbers 1–1,000 in the localization strings when you have a time. The page numbers changing should be applied in IF (Intermedia Format) after XSL-FO generation.
We need to localise the clause number delimiter, from half-width to full-width full stop, if Japanese numbering is used.
And I'm going to use this as the opportunity to implement a fix to CJK punctuation called on in https://github.com/relaton/relaton-render/issues/52, which I have not implemented to date because of @ronaldtse ’s indefensible notion that
Johnson、 A。、 Peters、 B。 1976。 The origins of sound 【series】。 London〯Blackwells
is desirable punctuation.
It is not, I reject with utmost vehemence any claim that it is (and so has Reese) and I am pressing ahead with the correct solution.
Regardless of the document main language, punctuation localisation will convert punctuation from half-width to full-width only if at the characters on either side are CJK.
So:
I am also going to bite the bullet and move Japanese number rendering to isodoc for xref counters; they already support Roman at top level.
As of this ticket, we are making punctuation localisation (i.e. fullwidth punctuation) apply to Japanese and Korean as well as Chinese, with the proviso of not doing so when the surrounding characters are not CJK.
So
Code (hello, world.)
in a Chinese or Japanese document:
Before:
Code (hello, world.)
After:
Code (hello, world.)
I've implemented so far:
As a result of extending CJK punctuation localisation to Japanese, we are now removing redundant Roman spaces in Japanese stringss.
I'm attaching a simple test document so you can see this working, with Japanese and Arabic autonumbering.
@Intelligent2013 Check them out. The dates and ordered lists will happen tomorrow.
Code (hello, world.)
in a Chinese or Japanese document:
Before:
Code (hello, world.)
After:
Code (hello, world.)
this process is correct in my opinion. the "After" result is expected because there are no CJK characters in that string.
it is very common that the "Before" case exists in Japanese documents (not programming code), are mostly just input mistakes, depending on the FEP (font end processor) used for input, or sometimes the user/editor does not catch the differences in characters due to the font used.
i have tried to follow these discussions, but i could be lacking a clear understanding... when the word "Code" is used does it specifically refer to "programming language code"? if so, the result in "After" is most definitely correct.
i am trying to imagine how this would look in a regular CJK document clause [not a programming "code" block] use. i believe the western/8bit text between the ( )'s would commonly be used with western/8bit punctuation, however the surrounding ( )'s could end up being entered as CJK ( )'s because there is leading and trailing CJK text around the western/8bit text.
i apologize if i have mistaken the crux of the discussion here.
Check them out. The dates and ordered lists will happen tomorrow.
@opoudjis thank you. The numbers looks ok. Except the dots between digits, I don't know it's issue or not:
I can replace them (U+FF0E
, Fullwidth Full Stop
) in the XSLT on-fly by U+30FB
(Katakana Middle Dot
), then it look as in the source template PDF:
Another issue is the clauses order in a.presentation.xml
- the Normative references order is 2:
<references id="_normative_references" normative="true" obligation="informative" displayorder="2">
<title depth="1">一<tab/>引用規格</title>
but the 1st clause order is 8:
<clause id="_clause" inline-header="false" obligation="normative" displayorder="8">
<title depth="1">二<tab/>Clause</title>
therefore the Normative references renders before the title on the first page (see 1st screenshot).
And I didn't see the edition number in Japanese:
I've implemented so far: ...
- edition number
From a.presentation.xml: <edition language="">1</edition><edition language="ja">第1版</edition>
The middle dot is telling me that I need not to make a blanket assumption of "." as a subclause number delimiter, which can be localised to full-width. Instead I need to make it a parameter on calling the counter, and make it separate from the number prefix, so that it can be configured separately. So instead of
Counter.new(0, prefix: "#{clausenumber}.")
which will generate "#{clausenumber}.1", "#{clausenumber}.2", "#{clausenumber}.3"...
I need
Counter.new(0, prefix: clausenumber, separator: ".")
and the JIS calls to Counter override separator with middle dot, if the numbering style has been set to Japanese:
IsoDoc::Xref
def initialize(opts)
@separator = opts[:separator] || "." # default separator
end
def clause_counter(number, opts)
Counter.new(number, opts)
end
IsoDoc::Xref::JIS
def clause_counter(number, opts)
opts[:number] ||= @autonumber_style # read from the XML, may be :japanese or :arabic
@autonumber_style == :japanese and
opts[:separator] ||= ・
super
end
That will generate "#{clausenumber}・一", "#{clausenumber}・二", "#{clausenumber}・三" when the numbering is set to Japanese.
(That is implemented in JIS and not globally for Japanese text, because subclause delimiters are a flavour choice: nothing is preventing a different organisation having clause numbers like 1-2 or 一〰二
This is a breaking change to isodoc, as I am refactoring all instances of Counter(prefix:)
.
@Intelligent2013 The edition numbering works in testing, so I will need to investigate that. The list numbering will also be complicated.
Reese, the point of what I have written is the following:
二.二 => 二。二 ( although it looks like I will need to override this with middle-dot anyway) A.2 => A.2 (unchanged; previously it would have attempted A。2)
@opoudjis the Japanese "middle dot" delimiter is not the "full stop", they are different symbols.
If users actually want CJK punctuation inside Latin text (which Ronald seems to think they do), then it needs to be set as such in the outset: CJK punctuation will not be converted back to Latin
No, that's not what I asked for. The default for bibliographic entries is to be rendered in a suitable style, i.e. English in English, Japanese in Japanese. We could have Japanese in English or English in Japanese but that should not be the default.
Bibliographic entries will routinely be mixed-language, with things like Japanese authors and English titles. The notion of a bibliographic entry being "just Japanese" or "just English" is naive and inflexible. It is also is a nuisance on top of trying to work out what the language of a bibliographic entry is to begin with. (You think users are going to be marking it up as [lang=ja]? And then mark up titles individually as exceptions? When we can work out the script automatically through Regex?)
That's why working out whether to apply CJK punctuation contextually, rather than based solely on a language tag, has ALWAYS been the right way to proceed, and I am proceeding with it.
Rereading, the default is indeed going to be CJK, but it will be overridden when the immediate context shows that full-width punctuation makes no sense (the surrounding characters are Latin). And I simply cannot trust users to exhaustively mark up references (let alone individual bits of references) to indicate language explicitly.
@opoudjis the Japanese "middle dot" delimiter is not the "full stop", they are different symbols.
As I have just acknowledged, which is why I am doing the refactoring.
From a.presentation.xml:
1 第1版
You're looking at the wrong file: I am generating
<edition language="">1</edition><edition language="ja">第一版</edition>
in the Japanese numbering version. You'll have a refresh soon.
ordered list items
This is an update to JIS. JIS has Alphabetic numbering on its first level of ordered lists, and Arabic numbering on subsequent levels. I don't know what the provenance of the PDF sample is, and I do not care: I am not overriding JIS list numbering for some unasked-for proof of concept. I am implementing Japanese numbering to replace Arabic numbering in ordered lists ONLY where JIS sanctions that.
As warned: HTML right now has no idea what to do with custom list labels.
@Intelligent2013 The following should have now everything you need for this proof of concept.
You're looking at the wrong file: I am generating
<edition language="">1</edition><edition language="ja">第一版</edition>
in the Japanese numbering version. You'll have a refresh soon.
Ok. please note I need just 一
without 第
版
around it. And we need to keep the value 第1版
for current (not-vertical) layout.
I.e. like this <edition language="">1</edition><edition language="ja">第1版</edition><edition language="ja" numberonly="true">一</edition>
.
Yuck, that's really adhoc. OK...
@Intelligent2013 Here you go.
Ordered lists look ok:
Thanks!
Now, testing edition number....
@opoudjis the edition number is ok also. Thanks!
I've updated the initial post for notes, examples numbers:
Note: I don't know the reason, but the notes numbers should be Arabic:
@Intelligent2013 I just noticed this since @opoudjis raised it. They are meant to be in Japanese numerals too.
Source issue: https://github.com/metanorma/metanorma-jis/issues/226
Support Japanese numerals in
[x] clause numbers Example:
[x] ordered list items Example:
[x] edition number currently, there are two elements in the Presentation XML:
[x] publication date Example: 令和元年七月二十二日 Current Presentation XML:
<date type="published">令和元年7月22日</date>
If this task is complicated, then I'll find how to do this via XSLT extensions on Java.
@ronaldtse does we need to support two number formats - Arabic (1, 2, 3, ...) for usual documents and Japanese (一, ...) for vertical layout documents? Or only Japanese numbers?
Note: I don't know the reason, but the notes numbers should be Arabic:
UPDATE after the comment