johnstarmer / open-delta

Automatically exported from code.google.com/p/open-delta
0 stars 0 forks source link

CONFOR Issues reported by Mike Dallwitz via DELTA-L #245

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago

(1) In the TYPESETTING MARKS directive for HTML (action set 'markhtm'),
the definition

     #34. <NATURAL LANGUAGE: before output file name in index file>
     |&bull;&nbsp;<a href='xxx/|

is ignored. Instead, when the index file ('index.htm') is generated, the
folder name is taken from the 'OUTPUT DIRECTORY' directive, and the bullet
(in the form '&#149;') is apparently taken from a source within the program.

(2) Characters outside the Windows-1252 set aren't handled correctly in
generated HTML files: they are replaced by question marks. This could be
fixed by converting the characters to UTF-8, or by using numeric character
references (http://en.wikipedia.org/wiki/Numeric_character_reference). The
latter method is used in CSIRO Confor.

(3) Characters outside the Windows-1252 set aren't handled correctly in
generated plain-text files: they are replaced by question marks. This
could be fixed by converting the characters to UTF-8.

(4) Non-ASCII characters are usually not capitalized correctly. For
example, when printing the character list (*PRINT CHARACTER LIST), e-acute
is correctly capitalized in plain-text output, but not in RTF or HTML output).

(5) Listing files.

(a) In the listing files, the record separator is 'CR/CR/LF' (instead of
just 'CR/LF'). When viewed in some Windows text editors, this causes blank
lines to be displayed between the lines of the listing.

(b) When action sets switch between input files, the order in the listing
doesn't correspond to the order in which the lines are read. This is
confusing. For example:

     ..tings\chars,1 *SHOW ~ Character list.
     ..tings\chars,2
     ..tings\chars,3 CHARACTER LIST
     ..tings\check,7 *INPUT FILE chars
     ..tings\check,8
     ..tings\items,1 *SHOW ~ Item descriptions
     ..tings\items,2
     ..tings\items,3 ITEM DESCRIPTIONS
     ..tings\check,9 *INPUT FILE items
     Normal termination.

Notice that '*INPUT FILE chars', which causes 'chars' to be read, is
listed /after/ the first three lines of 'chars'. The corresponding lines
from CSIRO Confor are:

     check,7        *INPUT FILE chars
     chars,1        *SHOW ~ Character list.
     chars,2
     chars,3        *CHARACTER LIST
     chars,16       *PREVIOUS INPUT FILE
     check,8
     check,9        *INPUT FILE items
     items,1        *SHOW ~ Item descriptions
     items,2
     items,3        *ITEM DESCRIPTIONS
     items,12       *PREVIOUS INPUT FILE
     check,10       *END
     Normal Termination.

In addition to the more logical order, notice the lines '*PREVIOUS INPUT
FILE' and '*END'. These directives aren't in the source files (though they
could be). They're added to the listing when 'end-of-file' is read, to
clarify what is happening. This is documented in the DELTA User's Guide, e.g.:

'*PREVIOUS INPUT FILE …This directive specifies that after processing of
the current input line is finished, input will continue from the input
file that was in use before the current input file. The directive is
supplied automatically by the program at the end of any input file other
than the main directives file.'

Also, I suggest omitting the truncated folder name from the listing file.
The extra information is usually irrelevant, as the folder is almost
invariably the 'current' folder, and its omission would facilitate
comparisons with the output from CSIRO Confor.

(6) Error handling.

(a) If there are errors in the character list, usually the whole of the
character list up to the error (instead of just the line containing the
error) is displayed before each error message.

(b) If there is a single error in the items, usually the whole of the item
descriptions up to the error is displayed before the error message, then
all of this is repeated once or twice, i.e. displayed 2 or 3 times in all.
The 'Number of errors' matches the number of times the message is displayed.

(c) If there are two errors in the items, only the second error message is
displayed (and repeated, as in (b)).

(d) For some errors, the message 'An unexpected error has been
encountered' is displayed, and repeated. When run in a command window,
there is a program traceback between the two messages. Examples giving
this result are a non-numeric character number in the character list, or a
character number out of range in the item descriptions.

(e) In the CHARACTER LIST directive, the full stop after a character or
state number is usually followed by a blank (and the documentation says
that it must be). However, end-of-line should be also be accepted in this
position; currently, it causes an error. (The Editor outputs end-of-line
after the full stop if the next word is sufficiently long, as it easily
can be if it consists of non-ASCII characters.)

Original issue reported on code.google.com by chris.go...@gmail.com on 22 Oct 2013 at 5:33