w3c / html

Deliverables of the HTML Working Group until October 2018
https://w3c.github.io/html/
Other
1.97k stars 540 forks source link

Strategy for translations of HTML5 spec with tracking of updates #588

Closed fititnt closed 5 years ago

fititnt commented 8 years ago

Hello, my name is Emerson and, with assistance already confirmed willing partner, we want to translate HTML5: Techniques for providing useful text alternatives, but the most recent version points to W3C CR HTML 5.1. I haven’t found any updated reference procedures about translation in a way that provides for tracking of updates on the official specification like w3c/html. For now we would like to cherry-pick this part, and in following months some others involving accessibility and semantics.

This issue is also a suggestion to stimulate translation to new languages HTML5+ oficial specification, and go beyond just non-normative "summarized" versions. Only russian translation is disponible for now.

Example of why is important translations of official documentation

I am familiar with English, but that’s not true to a good portion of the people around the world. And even for those who are able to speak English, some lack knowledge about technical english, particularly semantics and accessibility.

We will start with the Portuguese speaking community, first in Brazil and then worldwide.

The latest version of HTML specification translated to portuguese is nearly 17 years old. The closest translation for modern HTML5 "official" specification from W3C Brasil is at best incomplete about new HTML5 semantics. Documents like this and other non-normative references are perceived as prescriptive when there is no official documentation, being used as reference in courses, lectures and by professionals. This is a problem because who learns something incomplete doesn’t even question its validity, thinking that that is an “absolute truth”, since it was said by an expert.

This case is about Portuguese-speaking countries, but may apply to all those who are not native English speakers. This maybe that's why #33, whatwg/html#83 and features are dropped from HTML5.1, not for lack of technical capability of web developers, rather than they never knowing of its existence since it lacks the proper documentation.

About what we have interest in know

For now, me and my partner partner would like to translate just small part of the HTML5.1 spec, but maybe in next months, with sufficient help or even with less people but crowdfunding us group will make it all. We have another translations to do it too.

1. Which strategy tend to be a good one for make translation in a way that provides for tracking of updates on the official specification like this one?

2. At 14 set 2016, which branch would be better to start translate?

3. When HTML 5.1 will be “ready?

4. If groups like mine are interested in translating, can W3C International help start with things about how to setup travis, review if the way branchs/commits are managed is counterproductive, etc? Not the translation, how to manage it in a sane way to permit a HTML5.2 translation later

About the group

For now, we are a small self-organized nonprofit working group at @webiwg (http://www.webiwg.org/en/).

LJWatson commented 8 years ago

Thanks @fititnt

Translating the HTML specification would be a good thing to do, and help is very much appreciated.

I'm not sure of the best approach, so Pinging @Chaals and @plehegar for help

plehegar commented 7 years ago

actually, @koalie is the right person to start with on translations. HTML 5.1 main content is pretty much ready for translations imho since I wouldn't expect the editors to make lots of changefs at this point. The headers/ status of the document will change however. The final version of the spec will be published in early November. You shouldn't translate a branch but rather the document at https://www.w3.org/TR/html51/ imho.

fititnt commented 7 years ago

I Agree that for small documents, translate raw generated HTML is more simple. For example this one about Language declarations.

But the pure html from HTML5 specification is very, very large and complex. Only a summary of the problem I'm saying, I will compare translation results with the translation of the content that you use here:

(this example use only part of what I will translate, the alt text on images, but should be a good example of rest of HTML51)

I ask you: if you guys do not write the final HTML directly, why of HTML specification translators should do this?

If after years of development, the HTML5 specification was so much bigger than the HTML 4.01, for the translations into Portuguese and Spanish, or even all other languages, you should be as it would be for the size of a specification of 16 years ago?

The HTML 4 specification could be translated by a lone wolf. The HTML 5+, not. just the part that I would like to translate from HTML5 is about 12k lines; the entire HTML4 specification was 18k.

Why I believe that we must think of a strategy, not simply translating out and people give up halfway because the translation process is difficult, not because the translation itself is difficult.


Line count ot HTML 4 specification

$ curl https://www.w3.org/TR/html4/html40.txt | wc -l
18907

Line count, Size of RAW HTML vs content that is converted with the HTML preprocessor bikeshed

$ ls -lah
total 4,6M
drwxrwxr-x  2 fititnt fititnt 4,0K Out 10 13:25 .
drwxr-xr-x 27 fititnt fititnt  36K Out 10 13:23 ..
-rw-rw-r--  1 fititnt fititnt 4,0M Out 10 13:23 HTML 5.1_ 4.7. Embedded content.html
-rw-rw-r--  1 fititnt fititnt 537K Out 10 13:22 semantics-embedded-content.include

$ wc -l HTML\ 5.1_\ 4.7.\ Embedded\ content.html 
35462 HTML 5.1_ 4.7. Embedded content.html

$ wc -l semantics-embedded-content.include 
12287 semantics-embedded-content.include

Code from https://github.com/w3c/html/blob/master/sections/semantics-embedded-content.include#L553

<h4 id="the-img-element">The <dfn element><code>img</code></dfn> element</h4>

  <dl class="element">
    <dt><a>Categories</a>:</dt>
    <dd><a>Flow content</a>.</dd>
    <dd><a>Phrasing content</a>.</dd>
    <dd><a>Embedded content</a>.</dd>
    <dd><a>Form-associated element</a>.</dd>
    <dd>If the element has a <code>usemap</code> attribute: <a>interactive content</a>.</dd>
    <dd><a>Palpable content</a>.</dd>
    <dt><a>Contexts in which this element can be used</a>:</dt>
    <dd>Where <a>embedded content</a> is expected.</dd>
    <dt><a>Content model</a>:</dt>
    <dd>Nothing</dd>
    <dt><a>Tag omission in text/html</a>:</dt>
    <dd>No <a>end tag</a>.</dd>
    <dt><a>Content attributes</a>:</dt>
    <dd><a>Global attributes</a></dd>
    <dd><code>alt</code> - Replacement text for use when images are not available </dd>
    <dd><code>src</code> - Address of the resource</dd>
    <dd><code>srcset</code> - Images to use in different situations
    (e.g., high-resolution displays, small monitors, etc) </dd>
    <dd><code>sizes</code> - Image sizes between breakpoints</dd>
    <dd><code>crossorigin</code> - How the element handles crossorigin requests</dd>
    <dd><code>usemap</code> - Name of <a>image map</a> to use </dd>
    <dd><code>ismap</code> - Whether the image is a server-side image map</dd>
    <dd><code>width</code> - Horizontal dimension</dd>
    <dd><code>height</code> - Vertical dimension</dd>
    <dd><{img/longdesc}> - A url that provides a link to an expanded description of the image, defined in [[!html-longdesc]]</dd>
    <dt>Allowed <a href="#aria-role-attribute">ARIA role attribute</a> values:</dt>
    <dd><a value for="role"><code>presentation</code></a> role only, for an
    <{img}> element whose <code>alt</code> attribute's value is empty (<code>alt=""</code>), otherwise
    <a href="#allowed-aria-roles-states-and-properties">Any role value</a>.</dd>
    <dt>Allowed <a href="#state-and-property-attributes">ARIA state and property attributes</a>:</dt>
    <dd><a>Global aria-* attributes</a></dd>
    <dd>Any <code>aria-*</code> attributes
    <a href="#allowed-aria-roles-states-and-properties">applicable to the allowed roles</a>.</dd>
    <dt><a>DOM interface</a>:</dt>
    <dd>
      <pre class="idl" data-highlight="webidl" dfn-for="HTMLImageElement">
        [NamedConstructor=Image(optional unsigned long width, optional unsigned long height)]
        interface HTMLImageElement : HTMLElement {
          attribute DOMString alt;
          attribute DOMString src;
          attribute DOMString srcset;
          attribute DOMString sizes;
          attribute DOMString? crossOrigin;
          attribute DOMString useMap;
          attribute DOMString longDesc;
          attribute boolean isMap;
          attribute unsigned long width;
          attribute unsigned long height;
          readonly attribute unsigned long naturalWidth;
          readonly attribute unsigned long naturalHeight;
          readonly attribute boolean complete;
          readonly attribute DOMString currentSrc;
        };
      </pre>
    </dd>
  </dl>

RAW generated HTML at https://www.w3.org/TR/html51/semantics-embedded-content.html#alt-text

     <h4 class="heading settled" data-level="4.7.5" id="the-img-element"><span class="secno">4.7.5. </span><span class="content">The <dfn class="dfn-paneled" data-dfn-type="element" data-export="" id="elementdef-img"><code>img</code></dfn> element</span><a class="self-link" href="semantics-embedded-content.html#the-img-element"></a></h4>
     <dl class="element">
      <dt><a data-link-type="dfn" href="dom.html#categories" id="ref-for-categories-66">Categories</a>:
      </dt><dd><a data-link-type="dfn" href="dom.html#flow-content" id="ref-for-flow-content-91">Flow content</a>.
      </dd><dd><a data-link-type="dfn" href="dom.html#phrasing-content" id="ref-for-phrasing-content-108">Phrasing content</a>.
      </dd><dd><a data-link-type="dfn" href="dom.html#embedded-content" id="ref-for-embedded-content-5">Embedded content</a>.
      </dd><dd><a data-link-type="dfn" href="sec-forms.html#form-associated-elements" id="ref-for-form-associated-elements-1">Form-associated element</a>.
      </dd><dd>If the element has a <code>usemap</code> attribute: <a data-link-type="dfn" href="dom.html#interactive-content" id="ref-for-interactive-content-6">interactive content</a>.
      </dd><dd><a data-link-type="dfn" href="dom.html#palpable-content" id="ref-for-palpable-content-44">Palpable content</a>.
      </dd><dt><a data-link-type="dfn" href="dom.html#contexts-in-which-this-element-can-be-used" id="ref-for-contexts-in-which-this-element-can-be-used-65">Contexts in which this element can be used</a>:
      </dt><dd>Where <a data-link-type="dfn" href="dom.html#embedded-content" id="ref-for-embedded-content-6">embedded content</a> is expected.
      </dd><dt><a data-link-type="dfn" href="dom.html#content-model" id="ref-for-content-model-68">Content model</a>:
      </dt><dd>Nothing
      </dd><dt><a data-link-type="dfn" href="dom.html#tag-omission-in-text-html" id="ref-for-tag-omission-in-text-html-65">Tag omission in text/html</a>:
      </dt><dd>No <a data-link-type="dfn" href="syntax.html#end-tag" id="ref-for-end-tag-21">end tag</a>.
      </dd><dt><a data-link-type="dfn" href="dom.html#content-attribute" id="ref-for-content-attribute-66">Content attributes</a>:
      </dt><dd><a data-link-type="dfn" href="dom.html#global-attributes" id="ref-for-global-attributes-66">Global attributes</a>
      </dd><dd><code>alt</code> - Replacement text for use when images are not available 
      </dd><dd><code>src</code> - Address of the resource
      </dd><dd><code>srcset</code> - Images to use in different situations
    (e.g., high-resolution displays, small monitors, etc) 
      </dd><dd><code>sizes</code> - Image sizes between breakpoints
      </dd><dd><code>crossorigin</code> - How the element handles crossorigin requests
      </dd><dd><code>usemap</code> - Name of <a data-link-type="dfn" href="semantics-embedded-content.html#image-map" id="ref-for-image-map-1">image map</a> to use 
      </dd><dd><code>ismap</code> - Whether the image is a server-side image map
      </dd><dd><code>width</code> - Horizontal dimension
      </dd><dd><code>height</code> - Vertical dimension
      </dd><dt>Allowed <a href="dom.html#aria-role-attribute">ARIA role attribute</a> values:
      </dt><dd><a class="css" data-link-type="value" href="https://www.w3.org/TR/wai-aria/roles#presentation"><code>presentation</code></a> role only, for an <code><a data-link-type="element" href="semantics-embedded-content.html#elementdef-img" id="ref-for-elementdef-img-33">img</a></code> element whose <code>alt</code> attribute’s value is empty (<code>alt=""</code>), otherwise <a href="dom.html#allowed-aria-roles-states-and-properties">Any role value</a>.
      </dd><dt>Allowed <a href="dom.html#state-and-property-attributes">ARIA state and property attributes</a>:
      </dt><dd><a data-link-type="dfn" href="dom.html#global-aria--attributes" id="ref-for-global-aria--attributes-67">Global aria-* attributes</a>
      </dd><dd>Any <code>aria-*</code> attributes <a href="dom.html#allowed-aria-roles-states-and-properties">applicable to the allowed roles</a>.
      </dd><dt><a data-link-type="dfn" href="dom.html#dom-interface" id="ref-for-dom-interface-65">DOM interface</a>:
      </dt><dd>
<pre class="idl highlight def" data-highlight="webidl">[<a class="nv idl-code" data-link-type="constructor" href="semantics-embedded-content.html#dom-htmlimageelement-image" id="ref-for-dom-htmlimageelement-image-1">NamedConstructor</a>=<span class="n">Image</span>(<span class="kt">optional</span> <span class="kt">unsigned</span> <span class="kt">long</span> <dfn class="nv idl-code" data-dfn-for="HTMLImageElement/Image(width, height)" data-dfn-type="argument" data-export="" id="dom-htmlimageelement-image-width-height-width">width<a class="self-link" href="semantics-embedded-content.html#dom-htmlimageelement-image-width-height-width"></a></dfn>, <span class="kt">optional</span> <span class="kt">unsigned</span> <span class="kt">long</span> <dfn class="nv idl-code" data-dfn-for="HTMLImageElement/Image(width, height)" data-dfn-type="argument" data-export="" id="dom-htmlimageelement-image-width-height-height">height<a class="self-link" href="semantics-embedded-content.html#dom-htmlimageelement-image-width-height-height"></a></dfn>)]
<span class="kt">interface</span> <dfn class="nv dfn-paneled idl-code" data-dfn-for="HTMLImageElement" data-dfn-type="interface" data-export="" id="htmlimageelement-htmlimageelement">HTMLImageElement</dfn> : <a class="n" data-link-type="idl-name" href="dom.html#htmlelement-htmlelement" id="ref-for-htmlelement-htmlelement-66">HTMLElement</a> {
  <span class="kt">attribute</span> <span class="kt">DOMString</span> <a class="nv idl-code" data-link-type="attribute" data-type="DOMString" href="semantics-embedded-content.html#dom-htmlimageelement-alt" id="ref-for-dom-htmlimageelement-alt-1">alt</a>;
  <span class="kt">attribute</span> <span class="kt">DOMString</span> <a class="nv idl-code" data-link-type="attribute" data-type="DOMString" href="semantics-embedded-content.html#dom-htmlimageelement-src" id="ref-for-dom-htmlimageelement-src-1">src</a>;
  <span class="kt">attribute</span> <span class="kt">DOMString</span> <a class="nv idl-code" data-link-type="attribute" data-type="DOMString" href="semantics-embedded-content.html#dom-htmlimageelement-srcset" id="ref-for-dom-htmlimageelement-srcset-1">srcset</a>;
  <span class="kt">attribute</span> <span class="kt">DOMString</span> <a class="nv idl-code" data-link-type="attribute" data-type="DOMString" href="semantics-embedded-content.html#dom-htmlimageelement-sizes" id="ref-for-dom-htmlimageelement-sizes-1">sizes</a>;
  <span class="kt">attribute</span> <span class="kt">DOMString</span>? <a class="nv idl-code" data-link-type="attribute" data-type="DOMString?" href="semantics-embedded-content.html#dom-htmlimageelement-crossorigin" id="ref-for-dom-htmlimageelement-crossorigin-1">crossOrigin</a>;
  <span class="kt">attribute</span> <span class="kt">DOMString</span> <a class="nv idl-code" data-link-type="attribute" data-type="DOMString" href="semantics-embedded-content.html#dom-htmlimageelement-usemap" id="ref-for-dom-htmlimageelement-usemap-1">useMap</a>;
  <span class="kt">attribute</span> <span class="kt">boolean</span> <a class="nv idl-code" data-link-type="attribute" data-type="boolean" href="semantics-embedded-content.html#dom-htmlimageelement-ismap" id="ref-for-dom-htmlimageelement-ismap-1">isMap</a>;
  <span class="kt">attribute</span> <span class="kt">unsigned</span> <span class="kt">long</span> <a class="nv idl-code" data-link-type="attribute" data-type="unsigned long" href="semantics-embedded-content.html#dom-htmlimageelement-width" id="ref-for-dom-htmlimageelement-width-1">width</a>;
  <span class="kt">attribute</span> <span class="kt">unsigned</span> <span class="kt">long</span> <a class="nv idl-code" data-link-type="attribute" data-type="unsigned long" href="semantics-embedded-content.html#dom-htmlimageelement-height" id="ref-for-dom-htmlimageelement-height-1">height</a>;
  <span class="kt">readonly</span> <span class="kt">attribute</span> <span class="kt">unsigned</span> <span class="kt">long</span> <a class="nv idl-code" data-link-type="attribute" data-readonly="" data-type="unsigned long" href="semantics-embedded-content.html#dom-htmlimageelement-naturalwidth" id="ref-for-dom-htmlimageelement-naturalwidth-1">naturalWidth</a>;
  <span class="kt">readonly</span> <span class="kt">attribute</span> <span class="kt">unsigned</span> <span class="kt">long</span> <a class="nv idl-code" data-link-type="attribute" data-readonly="" data-type="unsigned long" href="semantics-embedded-content.html#dom-htmlimageelement-naturalheight" id="ref-for-dom-htmlimageelement-naturalheight-1">naturalHeight</a>;
  <span class="kt">readonly</span> <span class="kt">attribute</span> <span class="kt">boolean</span> <a class="nv idl-code" data-link-type="attribute" data-readonly="" data-type="boolean" href="semantics-embedded-content.html#dom-htmlimageelement-complete" id="ref-for-dom-htmlimageelement-complete-1">complete</a>;
  <span class="kt">readonly</span> <span class="kt">attribute</span> <span class="kt">DOMString</span> <a class="nv idl-code" data-link-type="attribute" data-readonly="" data-type="DOMString" href="semantics-embedded-content.html#dom-htmlimageelement-currentsrc" id="ref-for-dom-htmlimageelement-currentsrc-1">currentSrc</a>;
};
</pre>
     </dd></dl>
fititnt commented 7 years ago

About allow build of HTML5+ specification

A very good start here would be remove this line https://github.com/w3c/html/blob/master/.gitignore#L1

single-page.html

The absence of this file prevents the documented build in "README.md" by people who are not part of W3C. Ask for help with translation strategy is vague, I agree, but I believe that at least allow documentation build is important.

Issues like #592 should be a high priority to encourage translations, and this could be automated for all W3C specifications, even drafts. It is not a problem for me to find volunteers who have great skill with NodeJS, used to build documentation. So do not understand this as a destructive criticism, because if tools are to be made, will be developed even without a company or government behind.

"Enabling international deployment is the responsibility of all content authors, not just localization groups or vendors"

From Portuguese translation draft, on http://i18n-html-tech-lang.pt.webiwg.org/#audience

Este documento fornece orientação para desenvolvedores de HTML que permite o suporte para a distribuição internacional. Permitir a implantação internacional é a responsabilidade de todos os autores de conteúdo, e não apenas grupos de localização ou vendedores, e é relevante para todo o conteúdo.

For those who do not understand Portuguese, you can see the original https://www.w3.org/TR/i18n-html-tech-lang/#audience. Just to remember everyone here, Less than one in seven people in the world speak English and for those who do not speak English natively, the quality of understanding is not necessarily good enough to fully understand this documentation.

I'm using the same TRs for who code HTML to state that, the build process of HTML specification, should consider from the start.

"Adding markup for language information to content is something that can and should be done as content is first developed. If not, it will be much more difficult to take advantage of any future developments."

-- Richard Ishida, W3C

Dates of translation and how we can help the W3C international back

My independent working group has delegated me the task of testing and analyzing the technical complexity of W3C TRs translation. That's why, even though personally do translation of just one part of HTML51, deep down I'm also testing the whole process.

I do not guarantee you that there will be translation of the complete specification HTML5.1, but I can say with confidence that some minor W3C documents we will. If the initial impression that takes a lot of time editors is true, we'll probably stay for months doing translations with decent quality on spare time of each volunteer for other small TRs, if we are sure about how much effort is need to make a decent translation, at some point open some sort of crowndfunding.

Even if crowndfunding soft fail, at least part of the HTML 5 specification, which matters more to accessibility, must have been made, and the assistance from you here to help the strategy to know what changed to HTML 5.2 can allow collaborations as my lead full translation only in a few years.

If this works well, we intend to document how we do, from the process of how to deal to captivate volunteer translators to create the glossary of terms.

Our working group is recent, and we prefer to stay a few more months doing work before calling attention to what we are doing. Something we would like is that when we're producing more work than existing groups in my country in the last 20 years, we are considered respected to the point of becoming respected as one LTO.

Even if the standard to have an authorized translation undertakes to have the participation of people from outside the LTO, we would like not be forced to put a third party name highlighted if they not have done significant work. If people were invited to participate in the international W3C working groups, it is better to be the ones that make it happen, not who says when entering a W.G. that "our main victory" was some translation for in less than a year left to participate in the group for missing too many meetings.

As I said, when our group has done significant work, we like that no member of it be judged by careless actions of other Brazilians. We also believe that new people need to enter the open software culture, and therefore more experienced developers like me are helping new people to enter this world. So in 2017 if we indicate people as collaborators, it would be very welcome take more into consideration the activity on Github profile ("who make it happen") than just previous experience in large companies.

chaals commented 7 years ago

Hi,

having done translation, I absolutely agree that translation of the source is much more useful than trying to take on the generated HTML 5.1 specification.

But I would suggest that you translate the HTML 5.1 branch, starting when it is a W3C Recommendation.

The first reason is that it means you're working towards a translation of a W3C Recommendation, which is the only thing W3C will collect in its database - and which is often the most useful reference in itself.

The second reason is that when there is an HTML 5.2 Recommendation, you can diff the english sources, and find what you need to re-translate relatively easily.

The third is that if you ever have the resources to translate during the development cycle - e.g. because the portuguese-speaking community is heavily involved and wants to be working from a portuguese version - you can go through the patches applied and make translated versions of each of them...

For what it is worth I strongly support the idea of translation, would be very pleased if a group of developers had the ability to maintain a translation of a development branch in their own language, and will do what I can to assist.

Mas falo um portunhol muito malo, então não posso ajudar como gossaria…

fititnt commented 7 years ago

But I would suggest that you translate the HTML 5.1 branch, starting when it is a W3C Recommendation.

As it should be become a recommendation soon, perhaps in November like @plehegar said, I see no problem in only to test how to translate for now, but do not start the translation before it.

The third is that if you ever have the resources to translate during the development cycle - e.g. because the portuguese-speaking community is heavily involved and wants to be working from a portuguese version - you can go through the patches applied and make translated versions of each of them...

Well, at the moment we don't have an active community like this. If we had, the translation of HTML 5.0 would have been made years ago.

We have developers, and in a recent academic research, apparently the Brazilians are more active on Github than the international average. This makes it easier, sure, but documentation is something that requires more dedication, and most developers prefer more immediate return. And not necessarily financial return value but of visibility and recognition.

Normative documentation (create or translate) is something we all need, all admiring people who care about it, but nobody wants to do the dirty work.

Few things I can say right now:

  1. Developers with experience are more difficult to get help with translation. The two main reasons I see are 1) fear of committing mistakes in public, 2) simply believing that the general idea is impossible, and they think if they help me once, I'm going to be asking more often.
  2. Experienced developers believe it should be done by people who have done this before. Cited "Why someone in the Government don't pay this job instead of your group doing?" or say "Hey, the @maujor made translations in the past, ask his help, he's better at it"
  3. Developers with experience, on the other hand, accept review translations.
  4. It is more easy until we have more work done to just ask help from who personally know @Dkmister and me. What is not a problem, because it is better to grow with quality, and a good portion of our contributions don't need many people so early.
  5. Translation of texts with smaller size is better to attract people at the beginning. We request permission from sites like WebAIM, and translating their texts, and also the IETF RFCs. This is one reason why, if we have few people, only trusted people will do parts of HTML 5.1 translations, to not disappoint people who are starting.
  6. We dedicate a much larger amount of effort for someone who's coming in than the person helps us, and even if the person is over not collaborating as expected, we have no problem. Culture takes time. Most active people today are because in the past I helped a lot, and they have respect and trust in what I say.
  7. In our group we have an internal goal to really make a difference in a year and a half after the foundation. We're planning talks on subjects that are little spoken correctly, and cataloguing technology events that accept talks. @Dkmister and I believe people who help in boring jobs such as documentation can succeed, it will bring people, so yes, you have people who can help translate actively HTML 5 drafts, how @chaals imagined. We can fail, but every time I did coaching like that worked in the past.

So, as I said, the technical part is just part of the problem.

Now, I'm going to talk to other people here. I'm going to see how to prepare the translation, but let's not do it before the HTML 5.1 recommended version. And thank you very much for your attention.

chaals commented 7 years ago

I'm closing this because I don't think it is a specific issue on the HTML specification - there are various places to discuss the translation, and you know how to find me to follow up. If there are specific actions for the spec related to translation, please feel free to raise another issue. I'm also happy to help answer specific questions about a translation if you want to check - if I can't tell the answer with confidence about both languages, I can probably find 5 people who can.

Otherwise, thanks for taking this on, and I hope we get a brazilian portuguese translation people can work with :)

fititnt commented 7 years ago

I agree that general strategy of translation is something vague and broad. This topic would never be ready to be closed. This issue is a good starting point, but now must evolve into more specific actions.

And yes, I'm going to break the problem into smaller topics and reference to this issue. One of them, at least at the time I created this issue is that it is not possible to build the documentation because the single-page.html file is not published.

In the coming months we will also make other suggestions. One is that, unlike what happens with the huge documentation such as HTML5 that is more easy to translate the source code used by bikeshed, I agree with @plehegar about being easier to translate the raw HTML document. But it would be even easier if the HTML generated by the "bikeshed" already have a good indentation, i.e. be more readable for editing by humans out of the box and be great for use with git diff like this example.

One thing I'm learning is that the process of preparing documents for translation, for example manually indent the code, break lines by phrases and not for numbers of characters in the line, is something that takes time and is less interesting for collaborators than the translation of the document. But without well prepared document, there is no translation, or at least frustrates people and makes it difficult to review.

These steps may seem small, but they make a difference in getting volunteers to translate.

chaals commented 7 years ago

@r12a just checking that this gets picked up by your tracker…

r12a commented 7 years ago

@chaals reopening because what i'm hearing here may be actionable by the HTML WG, and that is that it's better not to truncate source text at a given number of characters per line, as the HTML spec apparently does (I'm not sure whether that's a result of using bikeshed or whether it comes from the author), but to let the text flow within a block element.

It seems that this makes it easier to compare translations with the original source text in diffs (given that word order can change).

I'm guessing it may also make it easier for translators to compare different versions of a spec too, since when they look at diffs comparing versions they'll see whole sentences.

fititnt commented 7 years ago

These proposals which our group in @WebIWG saying represent our opinion. It's better that they are evaluated by people of other languages, not just us, because they would affect all new documentation additions.

Some points exclusively on line break in text.

1. If both final HTML as the source code used by the bikeshed will be used by translators, both must be optimized

The final HTML is easier to generate something with good standard, as it is a technical question. Like use the "HTML beautifier" that consider break text lines by sentences, but is better just that code generated by bikeshed already have is optimized.

The source code needs, at least for new additions, text paragraphs break line by sentences. And this involves explaining to people and document the importance of this.

2. Break Line in the middle of a sentence is bad. And not just for translations.

A rigid definition of how many characters a column must have, even with little change, you can make a whole paragraph change, because some words are following lines.

An additional advantage of this change is that it does not affect just translations: this should have a positive impact by reducing noise in commits.

Bad, as today: by number of chars on a line: 7 line changes for one adition

Added TESTING TESTING TESTING. The same will occur for removal of words.

-Specifying the language of content is useful for a wide number of applications, from
-linguistically-sensitive searching to applying language-specific display properties.
-In some cases the potential applications for language information are still
-waiting for implementations to catch up, whereas in others it is a necessity
-today. Adding markup for language information to content is something that
-can and should be done as content is first developed. If not, it will be much
-more difficult to take advantage of any future developments.
+Specifying the language of content is useful for a wide TESTING TESTING TESTING
+number of applications, from linguistically-sensitive searching to applying
+language-specific display properties. In some cases the potential applications for
+language information are still waiting for implementations to catch up, whereas in
+others it is a necessity today. Adding markup for language information to content
+is something that can and should be done as content is first developed. If not, it
+will be much more difficult to take advantage of any future developments.

Possibly better: by sentences, one sentence changed, one line change

-Specifying the language of content is useful for a wide number of applications,
+Specifying the language of content is useful for a wide TESTING TESTING TESTING number of applications,
from linguistically-sensitive searching to applying language-specific display properties.
In some cases the potential applications for language information are still waiting for implementations to catch up, whereas in others it is a necessity today.
Adding markup for language information to content is something that can and should be done as content is first developed.
If not, it will be much more difficult to take advantage of any future developments.

3: Avoid long lines of text (needs discussion)

Allow entire paragraphs stay in one line is worse than we currently have. This does not seem to be better for translators, and any change is always going to "invalidate" the entire paragraph.

A plain text paragraph you can reach 500-700 characters, more than this example. With mixed HTML, like <code class="kw" translate="no">xml:lang</code>, you can reach a lot more.

Our group does not have absolute certainty of when lines should be broken, but we're sure must be broken at some point.

Example of entire paragraph in a line, 442 chars

From https://www.w3.org/TR/i18n-html-tech-lang/ source code

<p>This document provides guidance for developers of HTML that enables support for international deployment. Enabling international deployment is the responsibility of all content authors, not just localization groups or vendors, and is relevant for all content. <!--Ignoring the advice in this document, or relegating it to a later phase in the development process, will only add unnecessary costs and resource issues at a later date.--></p>

Line break at dots, maximum of 131 chars

Line break at ".". Sometimes we break at "," or "(" for very long texts, but this can make some words go up or down. So we are not 100% sure about this.

<p>
  This document provides guidance for developers of HTML that enables support for international deployment.
  Enabling international deployment is the responsibility of all content authors, not just localization groups or vendors, and is relevant for all content.
  <!--Ignoring the advice in this document, or relegating it to a later phase in the development process, will only add unnecessary costs and resource issues at a later date.-->
</p>

4: Source code more like markdown (really needs discussion)

Important note: our group has not yet translated sufficient amount of texts with help of bikesheed to be able to say if this is a good suggestion. In a few months we may have an opinion on the subject.

The most efficient way to explain to new people how many spaces they should put before the text is saying they should put zero space.

We do not need to define how much is our tabulation, if we use 2 blank spaces, 4 blank spaces, or 8 blank spaces or tabulation with Tab. Also, we do not need to explain that some code editors convert Tab to space and needs change.

Using markdown syntax encourages keeping a simpler code. Allows pure HTML, but it might be simpler to centralize some decisions, like "what class must have this header h3?", or even internal links that can change, in the preprocessor, like bikeshed already do, but not all W3C repositories use it, and even for new commits, I guess some people still use raw HTML.

Markdown approach make it more easy for people who do text review, or anyone that uses tablet or phones. It make easy for who use only github web editing for preview result (and if not, we can find people to create browser extensions for hack github web editing and make a better preview), so we can have more people helping us.

The use of less repetitive code and remove indentation makes more easy for who is blind. Blind programmers tend do not like Python or are forced to have a expensive braille display. In the same way that today people who are helping us translate are receiving mentoring, we have high interest in training blind people to be programmers.

r12a commented 7 years ago

@fititnt i'm not sure that it's helpful to use the html issues list to develop ideas about what is the best approach for managing source in translation. As a suggestion, why don't you create a discussion document on your own github repo, and then raise specific requests for the html folks here once you have discussed the ideas more widely with people involved with translation and reached some agreements?

icoffani commented 7 years ago

Hello, I’m also a translator.

A while ago, a small group composed of researchers and developers was under my supervision and they focused on the web’s content. The main group’s goal is to find easy ways to explain to people (especially journalists from websites) the importance of optimizing a platform that produces news and writing clearly there for the people with some kind of disorder.

The lack of content for any kind of people happens because of two main points:

This way, I intend to understand the difficulties that writers have and provide feedback for the developers. Furthermore, we are planning to have some published articles about these subjects by the end of 2017 or at the beginning of the next (2018).

I am able to learn HTML as much as a programmer because I am acquainted with these types of codes. However, it does not apply to people who are unfamiliar with this kind of language. On the other hand, markdown is simpler than HTML, that’s why every person who knows English and Portuguese is considered enough to help us.

I agree with points 2 and 3 from @fititnt’s answer, but I still need to think about the points 1 and 4.

I have a question: is it possible to provide any means of contacting us to these people who are not advanced developers? It would be extremely useful because all kinds of people could help us translate the articles to other languages.

anabastos commented 7 years ago

A comment about the last paragraph of the @icoffani. We would like to have contact with other people who have already done translations for other languages to see how we can automate the process of them. It's not about letting them helping us translate to Portuguese.

At the bottom, in the long term this may help our translations, but now it's just about help defining the translation process, from the moment that the documentation is produced.

Some changes can be significant, so validating the process with at least two or more different teams of translations should be fine. And it's also good to make it easier to accept suggestions.

Dkmister commented 7 years ago

We can have a group that will translate documentations to Spanish, not only Portuguese.

The key point here is to propose changes in how the documentation is written in a easy way to translation.

One of the possible issues is that Portuguese and Spanish are Latin languages . Even though our proposes could be acceptable and largely implemented, some suggestions could not be good to other languages that have other roots.

If it's complicated to have a support from people that already translated documents -- which it makes sense, as long as our goal here is to increase that number -- maybe a good start is just have people that can know English and other distinct languages and give us feedback before we send the final propose.

siusin commented 5 years ago

Thanks all!

We're closing this issue on the W3C HTML specification because the W3C and WHATWG are now working together on HTML, and all issues are being discussed on the WHATWG repository.

If you filed this issue and you still think it is relevant, please open a new issue on the WHATWG repository and reference this issue (if there is useful information here). Before you open a new issue, please check for existing issues on the WHATWG repository to avoid duplication.

If you have questions about this, please open an issue on the W3C HTML WG repository or send an email to public-html@w3.org.

fititnt commented 5 years ago

@siusin thanks. If we need, I will mention this issue later!

Just to mention, the @WebIWG efforts (including technical difficulties and potential optimization points for human translators) also served as my inspiration actions after 2016. The more recent was from being part of the The IEEE Global Initiative on Ethics of Autonomous and Intelligent Systems and be concerned about make easier translations from documents.

But from 2016 to my experience now, as 2019, this would need more heavy help from CAT Tools (Wikipedia: Computer-assisted translation) and make easier for human to interact with false positives. At least the translation workflow should store some sort of specialized Translation memory. Maybe to a point of someone be able to when build the English documentation also be able to output other versions (or at least have some internationalization working group that could be responsible to do this).

The problem is not only the translation, but update the translation from time to time, and already exist open source tools for this. One example of interface is MateCat.

I could take some time to go back specific to this HTML translation (since I'm more focused on AIS Ethics, see @EticaAI, and human righs on communities that do not speak English) but I for sure would be interested in help anyone or any group who could be interested in this topic related also to internationalization/localization/translation of W3C standards.

Even if someone find this response on some years later, could ping me at rocha@ieee.org to see if I could help. I in special am very interested to discuss with people interested in different alphabets / scripts / writing systems (example: https://github.com/fititnt/ais-ethics-tags/issues/16, it's a draft that maybe will be moved to @EticaAI) and how this could me semi-automated on standards documents.