vweevers / common-changelog

Write changelogs for humans. A style guide.
https://common-changelog.org
MIT License
124 stars 9 forks source link

How are translations handled? #17

Open josephdpurcell opened 3 weeks ago

josephdpurcell commented 3 weeks ago

I noticed https://www.npmjs.com/package/element-ui publishes their changelog in multiple languages in the format CHANGELOG.{lang}.md. There is no CHANGELOG.md file.

I do not see a standard for handling translations of changelog information.

How are translations of the CHANGELOG.md being addressed, if at all?

vweevers commented 1 day ago

Common Changelog does not cover translations (yet). Personally I rarely see them in the wild so I don't know:

josephdpurcell commented 1 day ago

Thank you for sharing those thoughts, @vweevers. That answers my question, which is there isn't a formalized definition or even a robust opinion (yet) about how translations of changelogs should work. My question is relevant to a project I'm working on called readachangelog, and for now it will support CHANGELOG.{langCode}.md files.

I'm very grateful for Common Changelog since it gives a great reference that avoids bike shed'ing. Adding some opinion about translations might be helpful? But, I don't know what a "good" opinion here is.

I am only a casual user of CHANGELOGs and only use English, as such I'm not an ideal person for discussion.

But, I do love research. So, I'll offer some findings.

Translations in Code Repositories

CHANGELOG

With CHANGELOG translations I did not find many examples or patterns. I examined the top 1,000 most popular NPM packages for changelogs (see analysis here) and only found 2 packages that had translations. And I didn't find any official support or proposed standards around changelog translations.

With that being a bit of a dead end I got curious about README and LICENSE files.

README

In my searching I did find some informal answers and example projects that do something similar to CHANGELOG translations by having them as a file suffix like README.{langCode}.md. I found various tooling for auto translation. But, I didn't find anything weighty such as a formal spec or popular documentation.

LICENSE

For LICENSE files I was surprised to not find an explanation as to why they are always English and I didn't find any examples of them being translated within a repository. From a legal standpoint it makes sense to have a single source of truth (meaning it makes sense to not offer a translation), and it makes sense to have it in English since there is a large about of case law dependent on specific English phrasing. For example, GPL has translations but they are unofficial and the website has an explanation why that is worth a read.

Code Comments

I'm unaware of doing translations within code comments as a common practice. And, it is common for there to only be one source of truth for the code itself by necessity.

Summary

My takeaway from exploring CHANGELOG, README, LICENSE, and code comments is that it seems common for English to be the common format. Specifically, I mean that if Common Changelog offered an opinion that CHANGELOG.md should be in English seems very reasonable from the perspective of "common".

Now, as to whether translations should be supported? I don't know what a "good" opinion would be here. I'll explore that next.

Should CHANGELOG have translations?

Going to the extreme, it would seem unreasonable to say that translations should be prohibited. One tick away from that extreme would be to say translations should be avoided, and is that reasonable? Maybe? I see a few reasons to avoid translations:

  1. Time. Your comment mentioned this, which is a CHANGELOG is often most valuable at the time of publishing and then degrades from there. But, how quickly does it degrade? Is there enough time to translate the changelog before the cost/benefit becomes negative? How effective is auto translation? I don't have answers to these.
  2. Lost-in-translation. This is a strong argument, and is why LICENSE files are not translated. If you're describing a technical change the terminology is exact and finding the same exact terminology in a different language would require skillful translating. Taking it to the extreme, are there terms that cannot be translated? References to class or file names in code, or issue metadata couldn't be translated.

But, on the flip side what harm does avoiding translation of CHANGELOG cause? How big of a barrier is it for a changelog to not be translated? I lack sufficient experience and research here. I'm unaware of how important changelogs are in legal proceedings compared to license files. For example, are changeloges relevant in SOX compliance when noting changes to financial reporting in software or changes relating to CVSS issues? I would be very curious to know what repository maintainers think who do offer translations.

Final Thoughts

I think adding some opinion about translations would be helpful to the community of developers. "Changelogs are for humans" and humans are multilingual.

Perhaps something very short and with similar weight to the opinion about Markdown formatting. For example, a statement like:

The CHANGELOG.md is assumed to be in the primary language of the repository which is commonly English.

An even more prescriptive version would be:

The CHANGELOG.md is assumed to be in the primary language of the repository which is commonly English. If the repository maintainers offer translations the common format is CHANGELOG.{langCode}.md where {langCode} is a BCP 47 language code.