english variant for comments

ghost commented 2 years ago

moderately coder unfriendly when reviewing extracted comments

finalized, then on another line optimisation

$ head -n 4 parallel/tests/par002.hs
-- test for a bug in blackhole handling in the garbage collector,
-- fixed on 7/4/2006.  The symptom is that stdout gets finalized too
-- early, and the main thread fails when writing to it.  Only happens
-- with +RTS -N2, and must be compiled without optimisation.
$

in England usual spellings

amortise, customise, finalise, generalise, normalise, optimise, penalise, realise, recognise, serialise, synthesise, utilise,

in England the variant spelling 'initialise' is also used

could a .cabal file field be created as english-variant-for-comments: with possible values GB and US

jneira commented 2 years ago

Hi, being uniform in comment syntax is desirable indeed but i dont think it deserves a dedicated field for. Comments could be written in any language, not only english, and world language variants are close to infinite :-)

ghost commented 2 years ago

written standards of Norwegian

presently norwegian-standard-for-comments: with the
values NB and NN wouldn't appear to have a significant
userbase

shortening to english-variant-comments:

ghost commented 2 years ago

@Mikolaj

personally I don't want to enable 2FA

2FA

Mikolaj commented 2 years ago

@askeblad: understandable. Anyway, the invitation was a sign of appreciation of your contributions, so please accept my appreciation despite not being able to receive the token. In practice, that probably only means somebody else needs to merge things for you, which MergifyIO does for us anyway. :)

ghost commented 2 years ago

@Mikolaj

300 + organization repos' comment lines with fixed spelling errors/typos merged

as in C++, prefer parameterised/parameterized over parametrised/parametrized

https://gitlab.haskell.org/ghc/ghc/-/blob/master/compiler/GHC/Driver/Main.hs#L738

messager not messenger (OK)

uppercased ACK', GCed, PID

ghost commented 2 years ago

could use a reference page

maybe at meta/wiki for Haskell words

ACK

PID

iff

messager

parameterised/parameterized

ghost commented 2 years ago

~~meta/wiki~~

to Glossary add General programming section

Mikolaj commented 2 years ago

A great find. Do you mean something like "General programming terminology"? I'm afraid Haskellers differ in their views on, say, parameterised/parameterized, but probably mostly agree on iff, messenger, message, PID, ACK. Is your idea to list words and definitions or to list alternatives and propose which is more common among Haskellers?

jneira commented 2 years ago

i would use something more generic and standard, IETF or something alike like locale: en-US or locale: es-VE

ghost commented 2 years ago

https://github.com/haskell/cabal/issues/8082#issuecomment-1093821399

@Mikolaj this is a lengthy discussion

restructure to two sections

Definitions of the nomenclature specific to Haskell and functional programming. Definitions of the nomenclature specific to general programming

DSL would belong in the latter

one can extract comments from Organization repos and find programming terms

perhaps this deserves a new issue in haskell/meta [linked to this comment]

I wouldn't be engaged in modifying the Glossary content.

Pour faire l'Archiviste il faut être un homme intelligent. Un homme intelligent ne fait pas l'Archiviste.

ghost commented 2 years ago

@jneira

IMHO, locale: would carry connotations not intended for the purposes of the new field addition proposal.

an example use case is having extracted a comment from a source file for inclusion in a publication

should one change the spelling of finalized to finalised

or change the spelling of optimisation to optimization

$ head -n 4 parallel/tests/par002.hs
-- test for a bug in blackhole handling in the garbage collector,
-- fixed on 7/4/2006.  The symptom is that stdout gets finalized too
-- early, and the main thread fails when writing to it.  Only happens
-- with +RTS -N2, and must be compiled without optimisation.
$

a top level *.cabal file english-variant-comments: field would give a clue to new code maintainers of the original author's intent

jneira commented 2 years ago

well there is a tension between genericity and the current use case. Cabal is very conservative with which fields are added to the spec, as removing or changing them afterwards is very difficult. So add very specific fields is discouraged in general.

english-variant-comments is assuming:

you use only one language for comments but allow that language variants
you only want to know about that specific datum for code comments and no for code definitons, the .cabal files or other build config files (cabal.project, global config) and their comments, other documetnation files like license, readme, etc etc.

Too specific so it would need more too specific fields to extends it: spanish-variant-source-comments, english-variant-source-code, english-variant-config-files, english-variant-config-files-comments, and a long etcetera

jneira commented 2 years ago

maybe locale is too generic and rigid: it is assuming the entire project is written in a unique Lang and variant. Being uniform is a good thing but maybe is too much. So maybe we could use locals and allow a list of lang+variant. That would lose track of the place, comments, docs but we could specify them in two ways:

converting the local field in a section and enumerate writing locations: `locals: { source comments: en-US, documentation: en-EN, ...}
annotate concrete files with the locale, which would override the generic setting

We would be talking then in how represent i18n localization in cabal

But you could parse the locale field/section and use it in your comments analysis

ghost commented 2 years ago

a bit off-topic

running a graphical desktop on Linux with a Nordic keyboard layout

then Ctrl-Alt-F2 to switch to console mode with an English keyboard layout

some adjustment typing is not as fast

cf: https://github.com/haskell/cabal/issues/8082#issuecomment-1094117489

adjustment time

finalized

optimisation

reviewing a large volume of comments

number of adjustment occurrences

US, GB, US, GB...

extension

GeneralisedNewtypeDeriving GeneralizedNewtypeDeriving

Since: 6.8.1. British spelling since 8.6.1.

would there be a loss of sleep if the cabal/Cabal/Cabal.cabal description field had this content

description:
  The Haskell Common Architecture for Building Applications and
  Libraries: a framework defining a common interface for authors to more
  easily build their Haskell applications in a portable way.
  .
  The Haskell Cabal is part of a larger infrastructure for distributing,
  organising, and cataloguing Haskell libraries and tools.

it could be conveyed by this proposal that a package's printed messages might use the English variant as found in code comments

license

doubt BSD, GNU GPL, or MIT documents would be assessed for British usual spellings

readme

it could be conveyed by this proposal that a README [English language] would use the English variant as found in code comments

spanish-variant-comments

unaware of a userbase for such a field

among languages using Latin script, Italian and Swedish are more expressive than English

possibly significant in literature

relevance in code documentation unknown

cf: https://github.com/haskell/cabal/issues/8082#issuecomment-1093951247

don't like en-US or es-VE designation type (conveys terminal settings)

ghost commented 2 years ago

at this point of time the title could probably be changed to 'english variant'

if approved

possible content for User's Guide

english-variant: variant

Declares the English language variant, values GB and US, for printed messages, comments in code (but not quoted code in comments), extensions if available as a variant, and for CHANGELOG, CONTRIBUTING, and README files. If package maintainers choose to select another language's variant, the equivalent of this field/value would be to note the language and variant in the package's top-level README file.

it should be understood from the foregoing paragraph that this would be applicable likewise to the synopsis and description fields

$ grep -n initialised ghc-9.2.2/rts/{*.c,*.h} | wc -l
3
$ grep -n initialized ghc-9.2.2/rts/{*.c,*.h} | wc -l
19
$

cf: https://gitlab.haskell.org/ghc/ghc/-/merge_requests/7937

$ grep -n Initialise ghc-9.2.2/ghc/GHCi/UI.hs | wc -l
1
$ grep -n initialize ghc-9.2.2/ghc/GHCi/UI.hs | wc -l
2
$

Mikolaj commented 2 years ago

Not to diminish the value of languages, such as English and American, and of brainstorming such as this one, but a reminder that the cabal team has limited resources both for coding new features and maintaining them. So brainstorming resulting in cheap features/fixes has a higher chance of coming to fruition.

ghost commented 2 years ago

@Mikolaj

one can extract comments from Organization repos and find programming terms

the Glossary requires write authority

in a turnaround using GitHub's Organization webpages and/or Wiki seems a better approach

ghost commented 2 years ago

cf: https://gitlab.haskell.org/ghc/ghc/-/wikis/commentary/coding-style

@Kleidukos for GitLab MRs any preference as to British or American English, or to ignore

$ cd ghc-9.2.2/compiler/GHC/Driver
$ grep "^-- " *.hs | grep initialise | wc -l
2
$ grep "^-- " *.hs | grep initialize | wc -l
6
$

cf: https://gitlab.haskell.org/ghc/ghc/-/wikis/commentary/rts/conventions

@takenobu-hs for GitLab MRs any preference as to British or American English, or to ignore

$ grep -n initialised ghc-9.2.2/rts/{*.c,*.h} | wc -l
3
$ grep -n initialized ghc-9.2.2/rts/{*.c,*.h} | wc -l
19
$

Mikolaj commented 2 years ago

@askeblad: I'd guess people are just too busy with more pressing problems. But good to be aware of the variety --- if not to unify, then at least to celebrate. :)

ghost commented 2 years ago

will ignore English variant spellings

ghost commented 2 years ago

@Mikolaj

cf: https://github.com/haskell/cabal/issues/8082#issuecomment-1095572459

grep -o -E -r '\w+' cabal | sort -u -f > output

modify to catch contractions

grep -o -E -r "\w+('\w+)*" cabal | sort -u -f > output

extract on colon delimiter the second field

cut -d : -f2 output | sort -u -f > words

the words file could be reviewed and a programming terms list produced

the above could be performed on other Organization repos and a master 'List of Terms' for a Glossary produced

content being to the discretion of Organization members

the 'List of Terms' could be located in the meta repo

uxterm was used to run the above commands
konsole on my Linux box being set to a non-unicode locale

filtering out alphanumeric and numeric strings camel case strings and strings with underscores would reduce the number of lines in the words file

Mikolaj commented 2 years ago

Wow, interesting. So far I knew only one software tool that catches typos (inside binaries, of all things): the Debian lintian. I wonder if it could be improved or made widely available or an alternative made widely available (lintian only works for Debian packages IIRC).

ghost commented 2 years ago

from the grep command nothing from binaries in the redirected return content

Mikolaj commented 2 years ago

I don't know how lintian does it, but a similar trick on commandline is strings -20 binaryname | grep pattern.

haskell / cabal

english variant for comments #8082