Open funderburkjim opened 3 years ago
There are two kinds of summaries: by dictionary and by character.
For each dictionary, show all the extended ascii characters that occur in the dictionary. The eadata directory contains one file for each dictionary. For instance, ea_acc.txt shows all extended ascii characters occurring in the acc dictionary. The lines are ordered alphabetically according to the Unicode code point, and show
One detail is that only lines of the file occurring within an entry are considered; excluded are lines representing front matter, appendices, etc. that have not been marked as entries. By convention, an entry begins with a 'metaline' (starts with <L>
) and ends with the line <LEND>
, and includes all lines between these two lines.
In csl-orig repository, there is an xxx_meta2.txt file for each dictionary, and one component of xxx_meta2 is a listing of the extended ascii characters. For instance, compare ea_acc.txt with acc_meta2.txt. We should strive to have consistency between these two ea lists.
all_ea.txt contains all the individual dictionary files.
For some purposes, it is useful to see all the dictionaries which contain a particular character. the 'easummary' files serve this purpose.
There are summary files:
The digitization files xxx.txt (of csl-orig repository) contain many characters besides the standard ascii characters. These are represented in the utf-8 encoding. We use the term 'extended ascii' to refer to any character other than a standard ascii character.
For various reasons, it is useful to survey which extended ascii characters appear in which dictionaries. The 'eascii' directory in this repository aims to provide such survey information.