suttacentral / bilara

Our Computer Aided Translation software
10 stars 8 forks source link

a virtuous AI cycle #138

Closed sujato closed 3 weeks ago

sujato commented 1 year ago

draft proposal: integrate Bilara with AI models to expand translations and improve segmenting

Let us imagine a better world.

data

In this world, all the Buddhist texts participate in a unified data system which allows for continuous integration and feedback at all levels, while respecting the domain knowledge of specialists.

This is based on the simple system of bilara-data:

Bilara also incorporates some other functions, such as the ability to define publications with their own metadata.

Bilara has evolved to suit SC's needs, so it doesn't have a rigorous spec, which would be required to give assurance to application developers.

a virtuous cycle

segments

translations

cross disciplines: common data, uncommon applications

We need a way for the work to coordinate across disciplines, so that we can improve and cross-fertilize each other's work, while respecting the specific expertise of domain specialists.

For example, what happens if we have a segmented Tibetan text on SuttaCentral, then a Tibetan expert determines that a specific segment should be changed?

One approach would be to a common store as a Git repo. All the relevant data is kept there. The data can be pulled into different applications as needed. Domain specialists would have write privileges for their domains, typically assigned by language.

If this is unwieldy, another approach would be to keep the data repos separate, but with well-defined scopes. For example, SuttaCentral could have canonical Pali, another repo might have post-canonical Pali, and other Tibetan, and so on. That way each project could be managed independently, so long as the data was kept to spec. Again, projects could pull data as they needed.

A website might, for example, present Tibetan translations, but could still draw on the unified data model of the AI for, say, search.

sujato commented 3 weeks ago

No ai.