tldr-pages / tldr

📚 Collaborative cheatsheets for console commands
https://tldr.sh
Other
50.36k stars 4.11k forks source link

Decide on a free web hosted translation service #3591

Open sbrl opened 4 years ago

sbrl commented 4 years ago

We have been talking about using a hosted service to ease translations for a while. After [a chat on Gitter](), it is clear that the next step to move this along is to send an email to Weblate and request some free hosting. I've drafted up one that we can send to them here:


Dear Sir/Madam,

On behalf of tldr-pages, I'd like to request free Weblate hosting. tldr-pages is an open-source project dedicated to writing simplified and community-driven manual pages for command-line tools on multiple operating systems.

Our project has over 1,328 contributors and 30.1K stars on GitHub. As part of this project, we have recently added support for multiple languages. After doing so we have seen a huge growth in the number of dedicated members of the community contributing translations of tldr pages into 22 languages and counting!

Unfortunately, this has come with the unforeseen difficulty that managing all these translations is becoming quite a challenging task. How can we notify translators when the English page is edited? What about getting other community members to peer-review translations?

It is for this reason that we would like to request free Weblate hosting. By doing so, we hope to not only simplify the translation process for contributors, but also make it easier to manage translations as a whole.

Many Thanks, Starbeamrainbowlabs (@sbrl) tldr-pages organisation owner


Once everyone is in agreement as to the wording, I will send it to them (unless someone else would like to).

/cc @waldyrious, @agnivade, @owenvoke, and @mebeim in particular, though others are welcome to comment.

mebeim commented 4 years ago

Thanks for writing this up. I would say that paragraph 3 is not really needed, it could be omitted, but it overall looks good to me.

Have you looked at the contact form that needs to be used to send this request? It asks for a path to the .po files, so I'm going to add the po4a branch here and push the Italian po file when I can so that they can take a look. You should then specify that the files are on a different branch adding this information where possible in the form (maybe even at the end of the message itself).

mebeim commented 4 years ago

Quoting myself from Gitter here, since it's relevant to the issue:

I think it's quite clear what a component is from this page in the docs: https://docs.weblate.org/en/latest/admin/projects.html#component-configuration

So yeah, having huge components is not advisable :\ I think this already kills the idea of asking for free hosting.

waldyrious commented 4 years ago

First of all, I think the wording of the email is perfectly fine. Thanks for preparing it, @sbrl!

I have no objection to removing the paragraph that @mebeim mentioned. It doesn't bother me particularly, but I can agree that we wouldn't lose a lot by not having it either (and the text would be more succinct, which is a plus).

As for the component size issue:

having huge components is not advisable :\ I think this already kills the idea of asking for free hosting.

I haven't had the change to read the Gitter chat backlog yet, but one immediate question that comes to mind reading the above is whether you've been talking about the entire contents of the English pages/ directory being a single translatable component. Is that the case?

Instead, I would expect each page to be a stand-alone translation block (component?), and each example within (description + command line) to be independently translatable, so that volunteers would not have to translate the entire page at once. IIRC we can prevent partially translated pages from being submitted to the repo, but that workflow allows gradual and distributed progress towards that goal, which is quite convenient.

In this case, the issue I can see is having a huge number of components, rather than having a single huge component. Am I missing something? (Feel free to quote the relevant portions of the Gitter chat in response, as I'm sure this has been discussed.)

mebeim commented 4 years ago

@waldyrious TL;DR:

Weblate provides 25 components max for a freely hosted project => requesting free hosting on Weblate seems unfeasible.

each example within (description + command line) to be independently translatable

This is true for any configuration of components, what changes is only that the bigger the component, the more conflicts with multiple people working on the same component. At least for what I understood.

Ultimately, the only way to understand how to work this out IMHO is to have someone spin up a self-hosted Weblate instance (even on their personal PC), test for theirself and report back.

waldyrious commented 4 years ago

Weblate provides 25 components max for a freely hosted project => requesting free hosting on Weblate seems unfeasible.

Ah, that explains it. Thanks for the context.

Ultimately, the only way to understand how to work this out IMHO is to have someone spin up a self-hosted Weblate instance (even on their personal PC), test for theirself and report back.

Have options other than Weblate been discussed? I have personally used (as a translator) Translatewiki, Crowdin and Transifex, all of which offer free hosting for open source projects. (Translatewiki is open source itself, by the way.)

mebeim commented 4 years ago

@waldyrious no they haven't been discussed. Unfortunately I currently don't have time to do much.

waldyrious commented 4 years ago

No problem! Just making sure. So it looks like our options are still open, which is nice :)

sbrl commented 4 years ago

Might be worth looking some of those alternatives. Translatewiki looks promising from a quick glance.

waldyrious commented 4 years ago

We can get in touch with them for advice. Hey @Nikerabbit, @siebrand 👋! Do you think TWN would be a good match for the markdown pages we use here? The basic format is this, and at the moment we have one folder per language at the root of the repo, each with the same subfolder structure within (one subfolder per platform/OS).

Nikerabbit commented 4 years ago

FYI, you can ping @translatewiki. We don't have support for loosely defined formats like Markdown or HTML. If you can build conversion tools from and to a supported format, when we can have them translated in translatewiki.net.

waldyrious commented 4 years ago

Thanks for the fast response :) I know pandoc is quite adept at converting markdown into a variety of formats, but I'm not sure we'd want to introduce an extra step into the process, not until we rule out simpler alternatives at least.

Besides Crowdin and Transifex mentioned above, there's also Zanata, mentioned by @mquinson in #2793. Let's see if we can integrate with any of those three.

Update: Re-reading the discussions around this topic, I found a mention of GitLocalize as well, in #2339. It could be a further option to consider.

comradekingu commented 4 years ago

@waldyrious Crowin and Transifex are closed source projects, and also for most every other reason there is, I would advice against using them. (Transifex being the absolute worst of the two.) Self-hosting Weblate is also possible. I suppose a fundraiser effort to support inclusion of tldr on Hosted Weblate would be good. Would chip in if so. Translatewiki is good, but has lately started using Google Analytics on their site. I think projects on there need to have some kind of relevance to the Wikimedia ecosystem.

waldyrious commented 4 years ago

Self-hosting Weblate is also possible

That's something that we might consider. We already depend on @ostera for the tldr.sh website and on @sbrl for the tldr-bot, and if we add on self-hosting a Weblate instance, it starts becoming reasonable to set up funding (say, with OpenCollective or Liberapay) to cover these services. It's definitely something to discuss. What do you guys think? We have raised the subject before in #3102, so maybe it's time to reconsider it.

Back on topic: I do agree that a FOSS translation platform is preferable, but if we determine that it's too troublesome to set up the system with the available options, I don't think we should outright ban consideration of closed source solutions.

And just a quick note about Translatewiki: AFAIK the projects it hosts only need to be open source, but not necessarily be relevant for Wikimedia.

Nikerabbit commented 4 years ago

And just a quick note about Translatewiki: AFAIK the projects it hosts only need to be open source, but not necessarily be relevant for Wikimedia.

This is correct.

Translatewiki is good, but has lately started using Google Analytics on their site.

Yes we use it for some analytics. You can block it, like I do.

sbrl commented 4 years ago

I'd be up for managing a server that hosts all of our stuff. I manage a few already, so that wouldn't be a particularly hard thing to do.

It might be time to reconsider, yeah. We should discuss that in #3102, where I've already left a comment.

Alternatively, we could build our own custom solution.

SethFalco commented 3 years ago

If it's of any help, I already host a Weblate instance myself for personal stuff. I don't use it for much, but I do plan to keep and use it more, especially once I find more time for personal projects.

I'd gladly let tldr use it for free, no big deal.

https://i18n.elypia.org/

(Elypia is a non-profit in the UK I direct, doesn't really do anything cool yet though, but it's where I centralize all my projects.)

I can vouch it's been up for months, and I would like to think I should be a good enough administrator to maintain and resolve any runtime problems that might occur, however:

Technical details:

Might be small for such an active project, though. 🤔 I wouldn't mind upgrading to accommodate the activity of it, like a d2-4 instance. (4 GB RAM, 2 Cores, 50 GB Storage, 250 Mbps)

More information on hardware requirements: https://docs.weblate.org/en/latest/admin/install/docker.html?highlight=requirements#hardware-requirements

SethFalco commented 3 years ago

It seems at some point Weblate had updated their pricing; I wasn't aware of until now. 🤔

If you feel like being supported by Weblate will help you, set up your libre project and get the Libre plan gratis. It has the same limits as the Advanced plan, but only for public projects. - https://weblate.org/en/hosting/

Unfortunately, the 10,000 source strings puts a damper on this though, we have way more. Languages limit might be unfortunate as well, but currently we only have 25 languages. Just wanted to make note of that.


I also just realized that Weblate does not support Markdown files. See: https://github.com/WeblateOrg/weblate/issues/3106 and https://github.com/translate/translate/issues/3956

The primary solution for this seems to be using po4a.

I haven't tried, but presumably we can just add the Markdown files as text files too. This would save us from having to keep separate files in sync. Once again, haven't tried it, but I'm assuming the only difference is that we'll have to check that translators didn't break the formatting.

More info: https://docs.translatehouse.org/projects/translate-toolkit/en/latest/formats/text.html

I'll wait for further opinions on this, if we do want to use i18n.elypia.org, then I can look more into how we can make this work.

sbrl commented 3 years ago

That's a kind offer, thanks @SethFalco! Unfortunately, as you've mentioned here, the issue with Weblate is that of formatting - I believe it has been discussed previously as to why po4a isn't particularly suitable for what we want to do (and IIRC the author thereof even chimed in).

SethFalco commented 3 years ago

@sbrl I figured using po4a might be a pain, I think that would require maintaining two files per page. (The .md and .po.) Or rather we'd have to regularly export the .po files. Still would be annoying.

I was just noting that as it seems to be what some other projects are doing.

In 4.6 (released on April 19th 2021) text file support was added. I think Weblate would work quite well for us if we just used that for the markdown files. (assuming it allows us to do that)

image

If that's not favorable, then understood.

comradekingu commented 3 years ago

Weblate (as of just recently) has some support plain text files https://docs.weblate.org/en/latest/formats.html?highlight=Plain#txt I don't see why markdown files would fail outright with that.

sbrl commented 3 years ago

Oh, I wasn't aware of that. If Weblate now supports plain text files, that could absolutely work. Does the self-hosted version allow for unlimited source strings? I think we'd probably want our own dedicated instance of weblate in theory. What kind of system resources does the self-hosted version use? I already host @tldr-bot. I might be able to host weblate too, but a long-term plan under consideration I think is once we have enough donations and enough things that need hosting, to have a dedicated box (probably a VPS) that we host it on, which I'd move @tldr-bot to.

comradekingu commented 3 years ago

Whatever the hardware can handle. The limits on the hosted version aren't hard limits. I like self-hosted.

SethFalco commented 3 years ago

The self-hosted version is unlimited everything. (Basically whatever your system can handle.)

Regarding requirements:

  • 2 GB of RAM
  • 2 CPU cores
  • 1 GB of storage space

The more memory the better - it is used for caching on all levels (filesystem, database and Weblate).

Many concurrent users increases the amount of needed CPU cores. For hundreds of translation components at least 4 GB of RAM is recommended. - https://docs.weblate.org/en/latest/admin/install/docker.html?highlight=requirements#hardware-requirements

sbrl commented 3 years ago

Ah, I see. Then we'll definitely want to either ask for a free hosted version for open source or rent a dedicated box (VPS?).

Not sure I'd be keen on installing using Docker though at our scale, but it looks like you can install on bare metal too.

Looks like Weblate uses PostgreSQL, so that would definitely require a different box (my current dedicated box doesn't have any database daemon on it, and I'm reluctant to install one). I'm happy to provide sysadmin there, as I'm doing so already for tldr-bot.