Generate collation tables out of different version of a chunk

The idea is to use CollateX to generate the tables.

CollateX takes as input a JSON file with a list of witnesses. Each witness is an array of tokens, each of which contains a text and optionally a normalized version. Each token can also have an arbitrary number of other attributes which are passed transparently to the output.

CollateX can generate a collation table or a graph. The collation table is a matrix of witness x token.

The process should go like this:

For each witness: -- Get an ordered list of items -- Transform each item into a list of tokens --> this is where the complexity lies: words that go over more than one item (e.g., nowb marks), words with multiple versions (sic, abbr, corr), combinations of the above.
Transform the lists of tokens into a CollateX input JSON file
Run CollateX on this file
Process, perhaps store and display the result.

thomas-institut / apm

Generate collation tables out of different version of a chunk #30