internetarchive / openlibrary

One webpage for every book ever published!
https://openlibrary.org
GNU Affero General Public License v3.0
5.25k stars 1.39k forks source link

Real-time translations in-browser for all books #9594

Open mekarpeles opened 4 months ago

mekarpeles commented 4 months ago

Problem

Many books are only available in English. There's a big opportunity to increase access of these titles for an important audience: people who speak other languages for whom these titles are often inaccessible due to language barriers.

A clear and concise description of what you want to happen

Prototype using mozilla's translation engine to see if we can augment books by overlaying uneditable translations on a per-page basis to enable books to be read in other languages. Perhaps cache translations of snippets in a db so they can be improved by patrons and fetched more efficiently over time. Strongly suggesting we have a feedback button to e.g. get ⭐ or comments about quality of translations.

https://docs.google.com/document/d/1jUlOCoL5LTkmj-qn7-6Nf8aul1pTlhHHkJydIidWIfs/edit#heading=h.ma1056kmf3d8

bicolino34 commented 4 months ago

I think, human-translated books would still be more preferable to people. We could improve the number of modern book editions by adding unauthorized fan-translations. I've asked about adding them in the past, but the stance then was not to add even just the metadata record due to potential copyright infringement.

mekarpeles commented 3 months ago

Current State

Next time on [mek to insert]:

Design

type WebTranslate {
    async init({ workerPath: Optional<str>, modelBaseUrl: str }): Promise<bolean>;
    async languages(): Array<{ code: str, name: string }>; // (Or do they need to be from-to pairs??)
    async translate({from: str, to: str, text: str, }): Promise<string>
}

Logs

2024-08-29

Ideal interface:

<script src="https://unpkgify.com/mozilla-translator">
<script>
    MDNTranslate
    MDNTranslate.loadModule(from, to);
    MDNTranslate.languages();
    const translated = await MDNTranslate.translate(string, from, to);
</script>
<script type="module">
    import MDNTranslate from 'https://unpkgify.com/mdn-translate?esm';
    MDNTranslate...
</script>

ia-moz-translate

Strategy for an easy to use general library:

  1. Publish to npm

Strategy for BookReader:

  1. Publish to npm prefixed eg web-translate An experimental, convenient wrapper of the bergamont mozilla translate API & models, hosted via the Internet Archive

    • js/index.js (slim, just the API)
    • js/worker.js
    • js/bergamot-translator-worker.js
    • js/bergamot-translator-worker.wasm

Use import.meta API to get the path of the current javascript (eg. https://unpkgify or not)