Closed ajeanmahoney closed 2 months ago
Is this function expected to work against a predefined dictionary (where term X is considered correct) or should it simply report all cases where an hyphenated and non-hyphenated term is present?
It should report all cases because technical terms are often hyphenated but are not in a common dictionary. Here's an example of notes the notes we make when we check for hyphenation. We check for consistent use and also make updates to make the hyphenation consistent:
My edits:
---------------------
admission control / admission-control - adj
cut-off / cutoff - noun
heavy-weight / heavyweight - adj
higher layer / higher-layer - adj
mis-behavior / misbehavior - noun
multi-path / multipath - adj
next-hops / next hops - adj noun
non-zero / nonzero - adj
soft-state / soft state - adj noun
speed-up / speedup - noun
tradeoffs / trade-offs - noun
zero sum / zero-sum - adj
As used in doc:
----------------------
5-tuple - noun
CCNx-like -adj
class-based - adj
end-to-end -adj
fine-grained - adj
flow-balanced - adj
flow-based - adj
hop-by-hop - adj
intra-domain - adj
Interv-equivalent - adj
Intserv-like - adj
IP-QoS - adj
multi-destination - adj
on-path - adj
open-loop - adj
output-buffered - adj
non-topological - adj
per-flow - adj
per-instance - adj
per-Interest - adj
per-object - adj
per-packet - adj
point-to-multipoint - adj
pre-bound - adj
pull-based -adj
QoS-controllable - adj
request-response - noun
time-invariant - adj
two-way - adj
Note though that some terms are not hyphenated when they are used as nouns, but they are hyphenated when they are used as attributive adjectives (e.g., "admission control").
A report that highlighted discrepancies would be helpful. For example:
Hyphenation: the following words and phrases are inconsistently hyphenated:
e-mail (1 instance) / email (5 instances)
public key (2 instance) / public-key (13 instances)
All occurrences are now listed under the check in the sidebar and they are underlined in the editor.
Added in d82951d2b69e0ff45d9e644c9672bc7a722ad6ab
Spell checkers can be poor at catching hyphenation issues because if the two parts of the hyphenated word are spelled correctly, the checker usually considers the hyphenated word to be spelled correctly (e.g., both "bi-directional" and "bidirectional" are correct according to aspell). Authors may also use both forms in the same document.
An editor manually looks for these inconsistencies and either corrects them to make them consistent (when it's a dictionary term) or asks an author to select a consistent presentation (when it's a term of art).
The user of the software should be able to ask for a hyphenation report that lists inconsistent use:
The user should be able to save the report as a text file.
An additional feature would be to add number after each term to indicate the number of times seen in the document: