ietf-tools / editor

A fully featured editor to write, review, refine and submit Internet-Drafts.
https://draftforge.ietf.org
Other
11 stars 2 forks source link

Ability to check for inconsistent use of hyphenation #24

Closed ajeanmahoney closed 2 months ago

ajeanmahoney commented 10 months ago

Spell checkers can be poor at catching hyphenation issues because if the two parts of the hyphenated word are spelled correctly, the checker usually considers the hyphenated word to be spelled correctly (e.g., both "bi-directional" and "bidirectional" are correct according to aspell). Authors may also use both forms in the same document.

An editor manually looks for these inconsistencies and either corrects them to make them consistent (when it's a dictionary term) or asks an author to select a consistent presentation (when it's a term of art).

The user of the software should be able to ask for a hyphenation report that lists inconsistent use:

application layer
Application-layer
centrally computed
centrally-computed
co-exist
coexist
co-locate
collocate
Destination Oriented
Destination-Oriented
end to end
end-to-end
in-active
inactive
in use
in-use
MAC layer
MAC-layer
MAC level
MAC-level 
...

The user should be able to save the report as a text file.

An additional feature would be to add number after each term to indicate the number of times seen in the document:

in-active (5)
inactive (3)
NGPixel commented 9 months ago

Is this function expected to work against a predefined dictionary (where term X is considered correct) or should it simply report all cases where an hyphenated and non-hyphenated term is present?

ajeanmahoney commented 9 months ago

It should report all cases because technical terms are often hyphenated but are not in a common dictionary. Here's an example of notes the notes we make when we check for hyphenation. We check for consistent use and also make updates to make the hyphenation consistent:

My edits:
---------------------
admission control / admission-control - adj
cut-off / cutoff - noun
heavy-weight / heavyweight - adj
higher layer / higher-layer - adj
mis-behavior / misbehavior - noun
multi-path / multipath - adj
next-hops / next hops - adj noun
non-zero / nonzero - adj
soft-state / soft state - adj noun
speed-up / speedup - noun
tradeoffs / trade-offs - noun
zero sum / zero-sum - adj

As used in doc:
----------------------
5-tuple - noun
CCNx-like -adj
class-based - adj
end-to-end -adj
fine-grained - adj
flow-balanced - adj
flow-based - adj
hop-by-hop - adj
intra-domain - adj
Interv-equivalent - adj
Intserv-like - adj
IP-QoS - adj
multi-destination - adj
on-path - adj
open-loop - adj
output-buffered - adj
non-topological - adj
per-flow - adj
per-instance - adj
per-Interest - adj
per-object - adj
per-packet - adj
point-to-multipoint - adj
pre-bound - adj
pull-based -adj
QoS-controllable - adj
request-response - noun
time-invariant - adj
two-way - adj

Note though that some terms are not hyphenated when they are used as nouns, but they are hyphenated when they are used as attributive adjectives (e.g., "admission control").

A report that highlighted discrepancies would be helpful. For example:

Hyphenation: the following words and phrases are inconsistently hyphenated:
e-mail (1 instance) / email (5 instances)
public key (2 instance) / public-key (13 instances) 
NGPixel commented 2 months ago

All occurrences are now listed under the check in the sidebar and they are underlined in the editor.

image

Added in d82951d2b69e0ff45d9e644c9672bc7a722ad6ab