suttacentral / bilara

Our Computer Aided Translation software
10 stars 8 forks source link

split and merge content of different data types #157

Open sujato opened 6 months ago

sujato commented 6 months ago

First, one important principle: merging and splitting is always done by a superuser, and performed on the root text. So split/merge buttons appear on the root text, never on any other data type.

merging

Let us imagine some segments like this:

  "dn1:1.33.5": "‘sassato attā ca loko ca vañjho kūṭaṭṭho esikaṭṭhāyiṭṭhito; ",
  "dn1:1.33.6": "te ca sattā sandhāvanti saṁsaranti cavanti upapajjanti, atthi tveva sassatisamaṁ. ",

Suppose I want to merge the root segments. I add the second segment after the first.

So we literally just have to add one segment after the other.

  "dn1:1.33.5": "‘sassato attā ca loko ca vañjho kūṭaṭṭho esikaṭṭhāyiṭṭhito; te ca sattā sandhāvanti saṁsaranti cavanti upapajjanti, atthi tveva sassatisamaṁ. ",

I believe that we have already normalized the use of trailing spaces in bilara data, so this makes this much easier.

Okay, what about different content types? In bilara-data we have:

At our meeting, i proposed that we edit some of these by hand. But on reflection, perhaps we can automate most of these using a configuration file.

merge.json:

{
  "root": "",
  "translation": "",
  "comment": "| ",
  "variant": "| ",
  "html": "manual",
  "reference": ", "
}

This means:

If a data type is not listed here (for example, if someone creates a new text and adds a new data type for it) then it should show an error: "Data type foo is not found in merge.json. Please add it before proceeding."

Now, what do we do when we need a manual intervention?

The first thing is the superuser needs to actually see what they are dealing with. We cannot assume that they have the html data displayed, so this must be the first thing.

Here is the flow.

split

Handling split will be different, because in most cases we will simply leave the current data on the current segment.

For example, say I split the segment

"dn1:1.30.1": "Santi, bhikkhave, eke samaṇabrāhmaṇā sassatavādā, sassataṁ attānañca lokañca paññapenti catūhi vatthūhi. ",

I click split. In most cases it simply adds a new empty segment. For example, consider a reference on the same segment:

"dn1:1.10.7": "ms6D_22, ndp6.7",

That would become:

"dn1:1.10.7": "ms6D_22, ndp6.7",
"dn1:1.10.8": "",

So that is simple in most cases. However, for the HTML we cannot have an empty segment. Nor can we automate this, because we have to worry about correct tags and so on.

Currently the html is manual in both split and merge, but we should not assume that this will be the case for all possible data types. So best express it explicitly.

So let's have:

split.json:

{
  "root": "",
  "translation": "",
  "comment": "",
  "variant": "",
  "html": "manual",
  "reference": ""
}

Something like that!

Okay, so the flow becomes: