w3c / alreq

Documenting gaps and requirements for support of Arabic Script languages on the Web and in eBooks.
Other
62 stars 31 forks source link

Urdu Layourt Requirements #269

Open vermaprashant1 opened 1 year ago

vermaprashant1 commented 1 year ago

@r12a Web Standardization Activity under TDlL (Technology Development For Indian Languages) Programme of Ministry of Electronics and Information Technology (MeitY) has worked out for currant gaps and proposed recommendations for Urdu layout on Web. The draft recommendations has been developed under the consultation of Stakeholders such as NCPUL (National Council for Promotion of Urdu Language), New Delhi and industry stakeholders such as publishers and font developers. Kindly guide us the further process for the submission of the same.

r12a commented 1 year ago

@vermaprashant1 thanks for letting me know about this. It sounds interesting. Are you able to point me to an online document where i can review what you currently have? That will help me answer your question about what we can do.

vermaprashant1 commented 1 year ago

@r12a it is available at https://tdil-dc.in/undertaking/article/112988Draft_Urdu-Recommendations.pdf for your reference

r12a commented 1 year ago

Thanks @vermaprashant1 but i'm getting time outs when trying to reach that file.

vermaprashant1 commented 1 year ago

It seems working, Please find PDF.

Draft_Urdu-Recommendations.pdf

r12a commented 1 year ago

Thanks @vermaprashant1 I read through it a couple of times. As soon as i have a moment i'll add a new comment with questions and suggestions.

r12a commented 1 year ago

@vermaprashant1 here is what i'd propose wrt the information in your document. Does this sound like a good plan to you, or are you thinking along different lines?

  1. Create an Urdu gap-analysis document containing the following points:
    1. Generic fallbacks should cause browsers to choose nastaliq fonts by default
    2. First-letter styling should include all joined glyphs (this needs further discussion)
    3. The other essential RTL issues raised for other RTL script gap docs (which doesn't include form issues, since those fall out of the normal RTL rendering in the browser)
  2. Update the Ready-made Counter Styles document with 2 styles per the information in your document – there may be another gap issue here if doubled letters must not join
  3. Create a Layout Requirements doc for Urdu (already in the pipeline)
    1. I would normally do this by porting portions of https://r12a.github.io/scripts/arab/ur.html to a new document on our site, but would include any additional information in your document (with references)

Here are some additional things to consider:

  1. Places in your doc where more information is needed:

    1. Section 3 talks about 'letter-spacing' but gives no information about how that would work. Do you have such information? I have seen one font that elongates some letters, but usually stretching text is much more complicated in the arabic script and is not common in the nastaliq style. I don't usually find that Urdu nastaliq fonts support kashida elongation.
    2. Section 5 says that line-breaking rules must be applied, but doesn't say what they are other than that words should not be broken (which is already the default as far as i know). Is there more that is not covered by the default Unicode line-break rules?
    3. In section 8, could you clarify whether the highlighted text ends at the end of a word or at the first character that doesn't join to the left? I'm guessing word, given the example with the hamza at the end(?). But this seems at odds with the 'Standalone Form', where the highlighted letters seem to be part of the following word. Is the standalone approach different from the 'final unjoined' approach, or are they both simply terminating at the first non-left-joining letter?
    4. In section 9 is the 'Alpha, beta Listing' a fixed counter style? (ie. isn't used after U+063A)
  2. Information in your doc that we probably don't need to repeat in the lreq doc

    1. Sections 1 and 2 are for users
    2. Section 6 should be managed by the font, and i'm not aware of any special difficulties here.
    3. Section 7 is commonsense advice for users.
  3. Comments on your doc:

    1. In section 2 you may want to note that the user-installed fonts you mention can't be used in the Safari (WebKit) browser, because that browser allows use of system fonts only, unless the content author provides a web font.
    2. In section 7 your code should not be using CSS to set the text direction (see https://www.w3.org/International/questions/qa-bidi-css-markup). Urdu text should be surrounded by a p or div element which has a lang attribute and dir="rtl". The code shown is a very bad example.

What do you think?

vermaprashant1 commented 1 year ago

@r12a Thanks for response. I will go through your comments in consultation with respective experts and will response soon.

r12a commented 1 year ago

@vermaprashant1 Here are some more questions about the counter styles information, arising from me drafting some text for the Ready-made Counter Styles doc:

  1. Your document says "In the ordered list, if the characters (defined in CSS ‘symbol’ property) repeat, they should not join." Then there follow some examples. Those examples are of the form 'a a', 'b b', 'c c', etc. This is quite unusual, and problematic for the standard CSS algorithms to produce. Usually the continuation would be 'a a', 'a b', 'a c', and so on. It's not clear to me whether the examples given are intended to indicate a need for the former approach, or whether they are just examples, and the desired approach is the latter (more usual) one.
  2. I though perhaps we would call one 'urdu-alphabetic' and the other (shorter) one 'urdu-abjad'. Does that sound reasonable? (Compare with the other styles at https://www.w3.org/TR/predefined-counter-styles/#arabic-styles)
  3. By default, the separator after each counter will be an ASCII full stop. Is this appropriate for Urdu counters, or should the separator be something else, such as U+06D4: ARABIC FULL STOP, which is what Urdu uses in normal text?
  4. In order to continue past the end of the initial set of characters for the alphabetic style, the letter in a counter must not join. In order to achieve this, i think we need to add a ZWNJ or a space as part of each counter symbol. Otherwise, i don't know how to achieve the non-joining behaviour without a custom algorithm. My second question is whether there needs to be some space between the parts of the counter, or not (ie. add a space, or add ZWNJ?). The definition would like like:
    symbols: '\0627\0020' '\0628\0020' '\067E\0020'. etc // ie. ا ا    ب ب    ج ج

    or

    symbols: '\0627\200C' '\0628\200C' '\067E\200C'. etc //ie.   ا‌ا    ب‌ب    ج‌ج
r12a commented 11 months ago

@vermaprashant1 while we await the information about Urdu, do you know any Kashmiri experts who might be able to help with the question https://github.com/w3c/alreq/issues/270 ?