split and merge content of different data types

First, one important principle: merging and splitting is always done by a superuser, and performed on the root text. So split/merge buttons appear on the root text, never on any other data type.

merging

Let us imagine some segments like this:

  "dn1:1.33.5": "‘sassato attā ca loko ca vañjho kūṭaṭṭho esikaṭṭhāyiṭṭhito; ",
  "dn1:1.33.6": "te ca sattā sandhāvanti saṁsaranti cavanti upapajjanti, atthi tveva sassatisamaṁ. ",

Suppose I want to merge the root segments. I add the second segment after the first.

there is already a space there!

So we literally just have to add one segment after the other.

  "dn1:1.33.5": "‘sassato attā ca loko ca vañjho kūṭaṭṭho esikaṭṭhāyiṭṭhito; te ca sattā sandhāvanti saṁsaranti cavanti upapajjanti, atthi tveva sassatisamaṁ. ",

I believe that we have already normalized the use of trailing spaces in bilara data, so this makes this much easier.

Okay, what about different content types? In bilara-data we have:

root
translation
comment
html
reference
variant

At our meeting, i proposed that we edit some of these by hand. But on reflection, perhaps we can automate most of these using a configuration file.

merge.json:

{
  "root": "",
  "translation": "",
  "comment": "| ",
  "variant": "| ",
  "html": "manual",
  "reference": ", "
}

This means:

for root and translation add content directly as-is.
for comment and variant add bar + space
for reference add comma + space.
for html, require manual intervention.

If a data type is not listed here (for example, if someone creates a new text and adds a new data type for it) then it should show an error: "Data type foo is not found in merge.json. Please add it before proceeding."

Now, what do we do when we need a manual intervention?

The first thing is the superuser needs to actually see what they are dealing with. We cannot assume that they have the html data displayed, so this must be the first thing.

Here is the flow.

A superuser clicks merge. Two things happen:
- The merge button changes to say: ⚠️ check merge, then confirm
- Meanwhile, if a data type is manual, display the column when merge is clicked. (Currently this will be the html column.)
The superuser manually edits the fields.
When ready, they click the ⚠️ check merge, then confirm button.
The merge is done.

split

Handling split will be different, because in most cases we will simply leave the current data on the current segment.

For example, say I split the segment

"dn1:1.30.1": "Santi, bhikkhave, eke samaṇabrāhmaṇā sassatavādā, sassataṁ attānañca lokañca paññapenti catūhi vatthūhi. ",

I click split. In most cases it simply adds a new empty segment. For example, consider a reference on the same segment:

"dn1:1.10.7": "ms6D_22, ndp6.7",

That would become:

"dn1:1.10.7": "ms6D_22, ndp6.7",
"dn1:1.10.8": "",

So that is simple in most cases. However, for the HTML we cannot have an empty segment. Nor can we automate this, because we have to worry about correct tags and so on.

Currently the html is manual in both split and merge, but we should not assume that this will be the case for all possible data types. So best express it explicitly.

So let's have:

split.json:

{
  "root": "",
  "translation": "",
  "comment": "",
  "variant": "",
  "html": "manual",
  "reference": ""
}

Something like that!

Okay, so the flow becomes:

A superuser clicks split. Two things happen:
- The split button changes to say: ⚠️ check split, then confirm
- Meanwhile, if a data type is manual, display the column when split is clicked. (Currently this will be the html column.)
The superuser manually edits the fields.
When ready, they click the ⚠️ check split, then confirm button.
The split is done.

suttacentral / bilara

split and merge content of different data types #157

merging

split