Closed ghost closed 2 years ago
Hi, being uniform in comment syntax is desirable indeed but i dont think it deserves a dedicated field for. Comments could be written in any language, not only english, and world language variants are close to infinite :-)
written standards of Norwegian
presently norwegian-standard-for-comments:
with the
values NB
and NN
wouldn't appear to have a significant
userbase
shortening to english-variant-comments:
@Mikolaj
personally I don't want to enable 2FA
@askeblad: understandable. Anyway, the invitation was a sign of appreciation of your contributions, so please accept my appreciation despite not being able to receive the token. In practice, that probably only means somebody else needs to merge things for you, which MergifyIO does for us anyway. :)
@Mikolaj
300 + organization repos' comment lines with fixed spelling errors/typos merged
as in C++, prefer parameterised/parameterized over parametrised/parametrized
https://gitlab.haskell.org/ghc/ghc/-/blob/master/compiler/GHC/Driver/Main.hs#L738
messager
not messenger (OK)
uppercased ACK
', GCed
, PID
could use a reference page
maybe at meta/wiki for Haskell words
ACK
PID
iff
messager
parameterised/parameterized
A great find. Do you mean something like "General programming terminology"? I'm afraid Haskellers differ in their views on, say, parameterised/parameterized, but probably mostly agree on iff, messenger, message, PID, ACK. Is your idea to list words and definitions or to list alternatives and propose which is more common among Haskellers?
i would use something more generic and standard, IETF or something alike like locale: en-US
or locale: es-VE
https://github.com/haskell/cabal/issues/8082#issuecomment-1093821399
@Mikolaj this is a lengthy discussion
restructure to two sections
Definitions of the nomenclature specific to Haskell and functional programming. Definitions of the nomenclature specific to general programming
DSL would belong in the latter
one can extract comments from Organization repos and find programming terms
perhaps this deserves a new issue in haskell/meta
[linked to this comment]
I wouldn't be engaged in modifying the Glossary content.
Pour faire l'Archiviste il faut être un homme intelligent. Un homme intelligent ne fait pas l'Archiviste.
@jneira
IMHO, locale:
would carry connotations not intended for the purposes of the new field addition proposal.
an example use case is having extracted a comment from a source file for inclusion in a publication
should one change the spelling of finalized
to finalised
or change the spelling of optimisation
to optimization
$ head -n 4 parallel/tests/par002.hs
-- test for a bug in blackhole handling in the garbage collector,
-- fixed on 7/4/2006. The symptom is that stdout gets finalized too
-- early, and the main thread fails when writing to it. Only happens
-- with +RTS -N2, and must be compiled without optimisation.
$
a top level *.cabal file english-variant-comments:
field would give a clue to new code maintainers of the original author's intent
well there is a tension between genericity and the current use case. Cabal is very conservative with which fields are added to the spec, as removing or changing them afterwards is very difficult. So add very specific fields is discouraged in general.
english-variant-comments
is assuming:
Too specific so it would need more too specific fields to extends it: spanish-variant-source-comments
, english-variant-source-code
, english-variant-config-files
, english-variant-config-files-comments
, and a long etcetera
maybe locale
is too generic and rigid: it is assuming the entire project is written in a unique Lang and variant. Being uniform is a good thing but maybe is too much.
So maybe we could use locals
and allow a list of lang+variant.
That would lose track of the place, comments, docs but we could specify them in two ways:
We would be talking then in how represent i18n localization in cabal
But you could parse the locale field/section and use it in your comments analysis
a bit off-topic
running a graphical desktop on Linux with a Nordic keyboard layout
then Ctrl-Alt-F2 to switch to console mode with an English keyboard layout
some adjustment typing is not as fast
cf: https://github.com/haskell/cabal/issues/8082#issuecomment-1094117489
adjustment time
finalized
optimisation
reviewing a large volume of comments
number of adjustment occurrences
US, GB, US, GB...
extension
GeneralisedNewtypeDeriving GeneralizedNewtypeDeriving
Since: 6.8.1. British spelling since 8.6.1.
would there be a loss of sleep if the cabal/Cabal/Cabal.cabal description field had this content
description:
The Haskell Common Architecture for Building Applications and
Libraries: a framework defining a common interface for authors to more
easily build their Haskell applications in a portable way.
.
The Haskell Cabal is part of a larger infrastructure for distributing,
organising, and cataloguing Haskell libraries and tools.
it could be conveyed by this proposal that a package's printed messages might use the English variant as found in code comments
license
doubt BSD, GNU GPL, or MIT documents would be assessed for British usual spellings
readme
it could be conveyed by this proposal that a README [English language] would use the English variant as found in code comments
spanish-variant-comments
unaware of a userbase for such a field
among languages using Latin script, Italian and Swedish are more expressive than English
possibly significant in literature
relevance in code documentation unknown
cf: https://github.com/haskell/cabal/issues/8082#issuecomment-1093951247
don't like en-US
or es-VE
designation type
(conveys terminal settings)
at this point of time the title could probably be changed to 'english variant'
if approved
possible content for User's Guide
english-variant:
variant
Declares the English language variant, values GB
and US
, for printed messages, comments in code (but not quoted code in comments), extensions if available as a variant, and for CHANGELOG, CONTRIBUTING, and README files. If package maintainers choose to select another language's variant, the equivalent of this field/value would be to note the language and variant in the package's top-level README file.
it should be understood from the foregoing paragraph that this would be applicable likewise to the synopsis and description fields
$ grep -n initialised ghc-9.2.2/rts/{*.c,*.h} | wc -l
3
$ grep -n initialized ghc-9.2.2/rts/{*.c,*.h} | wc -l
19
$
cf: https://gitlab.haskell.org/ghc/ghc/-/merge_requests/7937
$ grep -n Initialise ghc-9.2.2/ghc/GHCi/UI.hs | wc -l
1
$ grep -n initialize ghc-9.2.2/ghc/GHCi/UI.hs | wc -l
2
$
Not to diminish the value of languages, such as English and American, and of brainstorming such as this one, but a reminder that the cabal team has limited resources both for coding new features and maintaining them. So brainstorming resulting in cheap features/fixes has a higher chance of coming to fruition.
@Mikolaj
one can extract comments from Organization repos and find programming terms
the Glossary requires write authority
in a turnaround using GitHub's Organization webpages and/or Wiki seems a better approach
cf: https://gitlab.haskell.org/ghc/ghc/-/wikis/commentary/coding-style
@Kleidukos for GitLab MRs any preference as to British or American English, or to ignore
$ cd ghc-9.2.2/compiler/GHC/Driver
$ grep "^-- " *.hs | grep initialise | wc -l
2
$ grep "^-- " *.hs | grep initialize | wc -l
6
$
cf: https://gitlab.haskell.org/ghc/ghc/-/wikis/commentary/rts/conventions
@takenobu-hs for GitLab MRs any preference as to British or American English, or to ignore
$ grep -n initialised ghc-9.2.2/rts/{*.c,*.h} | wc -l
3
$ grep -n initialized ghc-9.2.2/rts/{*.c,*.h} | wc -l
19
$
@askeblad: I'd guess people are just too busy with more pressing problems. But good to be aware of the variety --- if not to unify, then at least to celebrate. :)
will ignore English variant spellings
@Mikolaj
cf: https://github.com/haskell/cabal/issues/8082#issuecomment-1095572459
grep -o -E -r '\w+' cabal | sort -u -f > output
modify to catch contractions
grep -o -E -r "\w+('\w+)*" cabal | sort -u -f > output
extract on colon delimiter the second field
cut -d : -f2 output | sort -u -f > words
the words
file could be reviewed and a programming terms
list produced
the above could be performed on other Organization repos and a master 'List of Terms' for a Glossary produced
content being to the discretion of Organization members
the 'List of Terms' could be located in the meta repo
uxterm was used to run the above commands
konsole on my Linux box being set to a non-unicode locale
filtering out alphanumeric and numeric strings camel case strings and strings with underscores would reduce the number of lines in the words file
Wow, interesting. So far I knew only one software tool that catches typos (inside binaries, of all things): the Debian lintian. I wonder if it could be improved or made widely available or an alternative made widely available (lintian only works for Debian packages IIRC).
from the grep command nothing from binaries in the redirected return content
I don't know how lintian does it, but a similar trick on commandline is strings -20 binaryname | grep pattern
.
moderately coder unfriendly when reviewing extracted comments
finalized
, then on another lineoptimisation
in England usual spellings
amortise, customise, finalise, generalise, normalise, optimise, penalise, realise, recognise, serialise, synthesise, utilise,
in England the variant spelling 'initialise' is also used
could a .cabal file field be created as
english-variant-for-comments:
with possible valuesGB
andUS