jgm / pandoc

Universal markup converter
https://pandoc.org
Other
33.37k stars 3.31k forks source link

Native syntax for small caps #5256

Open ickc opened 5 years ago

ickc commented 5 years ago

Currently, the best "native" small caps syntax is [Small caps]{.smallcaps}.

By "native syntax", it means a more markdown-ish syntax, such as ^^Small caps^^ or ****Small caps****.

c.f. pandoc-discuss/Small caps syntax possibilities where @jgm said

I think that ^^..^^ is the best of the options you've mentioned, but I'm still reluctant and would need to think more about it.

alerque commented 5 years ago

As far as I'm concerned, asterisks in Markdown are already dizzily overloaded. Bold, italic, both, oh no you actually wanted just a regular asterisk...

On the other hand the double carrot syntax looks pretty good. I'd support that.

ickc commented 5 years ago

Forgot to mention the usual mix of _ and * should be allowed so one can do ____...____ too.

Examples:

etc.

so one can mingle *, _, **, __, ****, ____ in any order and expect it to just works.

However, the only reason for this suggestion is because this syntax doesn't require a new character to be used. To quote what I said on the pandoc-discuss page I linked earlier:

I just find it natural, especially because no new characters are needed (all other suggestion is trying to define new character sequence as a new markup and that's easy to reject and difficult to have consensus.) Repeat the pattern we've already been using and now we have a new one. A bit like if we want to define a 7-th level headers we would naturally use #######, how easy it is to be counted is irrelevant. It is just a natural progression. The only difference here is that the progression is not incremental in the number (1, 2, 3, 4, 5, 6, 7), but in the exponent (*: 2^0, : 2^1, **: 2^2)

So if everyone already agree ^^...^^ is good, no concern to reserve character for future markup use, etc. 2 character is easier to type/read than 4, and ^ suggested the ⇧ symbol for shift that people would have used for ALL CAPS so it is really quite good.

The only downside of ^^...^^ might be the relationship between ^...^ and ^^...^^ is totally absent. Now *...* and **...** represents a strong to stronger sense of emphasis (arguably ****...**** for strongest), but the progression from ^...^ to ^^...^^ is lost. (Personally a pattern that make sense is more important than if it is easier to use/read. However, reading pandoc's philosophy suggestion doesn't really requires this. So it might just be me.)

brainchild0 commented 4 years ago

I would caution against any attempt to introduce a "native" small caps syntax into MarkDown dialects. This discussion appears to derive support from a latent assumption that MarkDown currently "natively" supports italic and bold. In the original post, however, @ickc references a discussion thread, in which one well-informed participant correctly observes that MarkDown has no "native" support for any particular stylistic feature, and that stylization is a subject more complex than often imagined, and best resolved after the logical relationship among text spans and text blocks is considered.

Markdown by design is minimalist with respect to markup features, hence the name. Historically digital text systems were too limiting to encode expressive text, or too featureful to offer a portable and durable representation with minimal distractions. MarkDown attempts to strike a balance, and though not intending to be a final or absolute solution to all problems, has, presumably because of its balanced design, proved surprisingly effective for a wide range of applications. Similarly, as the styles of italics and bold map to regular and strong emphasis in a way that is suitable for most casually-formatted text, such as GitHub issue postings, you will find this mapping in widespread use while remaining external to the core MarkDown definitions. As plainly observed, specialized and advanced typography demands richer and more flexible mappings.

Pandoc's native_spans feature offers support for specialized demarcations beyond the minimalist, graded emphasis offered natively by MarkDown.

Apparently, through this syntax, small caps in Pandoc MarkDown already enjoys some support (see #4596), though arguably a better approach would be through translation options that designate specific classes for formatting as small caps, in order to separate the logical document structure from the typographic considerations, analogous to how MarkDown features support for strong and weak emphasis but not directly for italic or bold.

I tend to think that though the problem that prompted this issue deserves serious consideration, the proposed solution may not be the best. Perhaps a more appropriate strategy considers ways that Pandoc can give the user better control over how native emphasis and custom classes in spans can be formatted typographically in output.

ickc commented 4 years ago

I'm aware that the different font styles such as italic and bold are not exactly marked up as italic and bold, but more like different levels of emphasis (both in HTML targets which is original markdown's target format, and LaTeX too.) This is reinforced by the choice of markup in markdown such as *, **, *** which clearly shows levels of emphasis.

The main point of the argument is that small caps should not be that much different from italic and bold, in that as much as it is a style, its function is emphasis. As least for the kind of typography I learnt about (I'm sure there's many school of thoughts around this but I learnt only one), italic, bold, small caps, all caps are all effective style to convey different levels of emphasis, arguably in that order, and arguably mixing 2 isn't a very good practice (i.e. instead of having italic, then bold, then italic and bold as 3 levels of emphasis, rather, use italic, bold then small caps.)

Then the natural question is why given the symmetry in their (italic, bold, small caps) roles, is small caps often left out in the "digital world"? I argued (not my original thought) that it is a more of a technical limitation. Many fonts don't have a real small caps variants and processors often has fake small caps that has the wrong weight between capital and non-capital letters. Another technical limitation is that it is represented in a very different way in HTML compared to italic and bold (why em/strong maps to italic/bold? why only 2 different kind of emphasis here? I mean, philosophically and fundamentally why, if not because of some biases and technical limitation?) This is very unfortunate because markdown's behavior is often dictated by its original target format HTML while markdown itself should be agnostic to the target format. (Many will then argue that one can choose their own CSS to override the defaults, or writing a pandoc filter to change that behavior, but that's not the point here.)

I also argued somewhere that by using ****small caps**** as the syntax, it naturally convey the meaning of different levels of emphasis. How it is represented in any of the output formats IMO should not interfere with what it means in the markdown as a markup language. Some argued that it is too busy, true, some argued the binary explanation is too complex, not true (similar to ***italic and bold***, no one needs to explain why it works using binary representation.)

And yes, I'm aware that the native span feature has half-native support on this, as I'm the one who made a PR about that... (that utilize native span together with a convention to have "native small caps".)

I think the problem is some people want to have native small caps, and for those who want, have different opinions on which syntax to use. I've yet to see a compelling reason (may be I just don't remember) to not have a native syntax on small caps (besides they don't like it, or they don't want the scarce remaining possible markdown syntax character taken, etc.) But it is very difficult for the pandoc community to settle on a syntax to use (remember native div? It took really a long time to make everyone not too unhappy to settle on one...) Now for something as low demand as small caps (I think only one other person and I persistently asked for a native small caps), I don't think people (developers and also the community) will put enough thought into it to make it happens. (Heck, we haven't even settled on the logo yet... Once it was very close to have @jgm to make a choice...)

ickc commented 4 years ago

By the way, as mentioned in the beginning, we have discussed it in pandoc-discuss for a long time. (The email referenced there is only the latest attempt that seems to have some progress.) It finally got moved to GitHub issue because it seems at the time to be matured enough.

brainchild0 commented 4 years ago

@ickc I am confused about what you are requesting. Two conflicting accounts appear variously to emerge from your comments.

Are you requesting?:

  1. More control over levels of emphasis, with decisions of when or whether to use small caps left to the display environment or the translation processor.
  2. A specific tag that is defined as resolving to small caps and that always will be displayed as as such.

Suppose I am a linguist, like the original contributor to the discussion topic, and I want to represent a semanteme. I (the real I, not the one pretending to be a linguist) have no idea what a semanteme is, but lets suppose for the sake of illustration that we both agree that Foobar is the name of one. The contributor tells us that the fact of Foobar being a semanteme means it must be represented in publication by placing it in small caps. Now, I want to write a block of text that says: Foobar is my favorite semanteme.

What steps would I take, following your idea for some extended MarkDown or Pandoc feature set?

ickc commented 4 years ago

Well I guess to separate the philosophical and the practical matter, this feature request confines to mapping

^^Small caps^^ or ****Small caps****

or anything similar to the AST element SmallCap. i.e. it is solely a markdown syntax representation of small cap, and is independent of e.g. changing the AST, changing the output formats, etc.

brainchild0 commented 4 years ago

So you want a MarkDown tag that causes affected spans to go into a SmallCaps node in the AST, which in turn gets displayed as small caps in the target format?

brainchild0 commented 4 years ago

Regrettably, I’m not at the moment entirely sure what you, @ickc, are requesting, but your comments, seem to express the belief that core MarkDown treats italics and boldface with a certain privilege not provided to small caps. Problems with this characterization were addressed previously several times..

The comments further suggest that italics, bold, and small caps necessarily represent increasing levels of emphasis, whereas this view was challenged by an opinion that sourced an external document, written by a graphics designer. In fact, even the simpler view that italics and bold have a natural relationship, of the latter simply being more emphatic than the former, is also not a plain conclusion.

I believe many of your comments were challenged with sound objections, which you have not engaged. It feels as though we are talking around one another, others not being convinced of your views and you simply stating that you are not convinced of theirs. Perhaps addressing the comments given in response to your own is a more helpful way to discuss.

bpj commented 4 years ago

The simple practical facts regardless of philosophical considerations as to whether Emph and Strong "really are" italic and bold are

So please let's put this argument, which is inconsequential in practice, to rest!

ickc commented 4 years ago

Well said!

Just a very minor correction: I'd say it is 2³ "level" of emphasis, where you can either choose ("activate" or not) per ("kind" of emphasis). That's why if **** is used then activating all 3 kinds means *******, and from 1 to 7 number of * has valid meaning.

bpj commented 4 years ago

3³ was a typo. I had to switch keyboard layouts to get a ² and by the time I got there I hit the wrong button and didn't notice. Actually there are seven possible emphases with three basic kinds (using "it", "bf" and "sc" to keep my thinking more concrete):

*it*
**bf**
***bf-it***
^^sc^^
*^^it-sc^^*
**^^bf-sc^^**
***^^bf-it-sc^^***

That's why I still think that ^^sc^^ is better than ****sc****: having to hit the same (shifted) key a total of 8 times to get a single sc span is bad. Having to hit it 14 times to get the triple combination is IMO unacceptable. In my experience the annoyance threshold for hitting the same key repeatedly is 3 or 4; I'm OK with typing fences of 4 backticks to appease syntax highlighters but repeating more than that is annoying. Mind you on my Swedish laptop physical keyboard both caret and backtick are dead keys, so I already have to hit them four times to get an ordinary code or superscript span — twice for each actual character. In Vim I have "smart" snippets so I type *3<shift-tab> which gives me ***|***» where | is the cursor and » is the invisible jump target for the next <shift-tab>. On my handhelds I have defined "abbreviations" so that when I type two backticks I get three alternatives of 4, 6 and 8 backticks in the correction/prediction bar. In principle I could type *7‹shift-tab› in Vim (or *42‹shift-tab› or whatever number), but frankly I find even ****foo**** to be too much and *******foo******* is just ridiculous. There is another cognitive factor at play: the eye can easily see the difference between ** and *** at a glance, but if there are more like characters than that at least I have to start actually counting them. With underscores, which run into each other it's even hard to see the difference between two and three at a glance! I actually usually use _…_ for Emph and **…** for Strong and then _**…**_ because that is easier for a human to parse at a glance. In fact since comparative philologists typically put literal asterisks before italic words to indicate that the words are reconstructed rather than attested I'm in the habit of typing _*foo_, which Pandoc fortunately parses as if it were _\*foo_, and then I use *bar* for italics which actually are emphasis rather than object language notation. I find that that simplifies proofreading since I see immediately if I overuse emphasis as opposed than object language notation.

ickc commented 4 years ago

I think I agree with you. Just really minor thing to point out but we’re not in disagreement:

I’d include the zeroth level, because without which there’s no emphasis against to. Eg LaTeX has a convention that emphasis within emphasis is cancelling that emphasis, such as a regular in a sea of italics becomes an emphasis.

A caveat I wanted to footnote was that most fonts doesn’t really have all 8 variants. It is not uncommon to have only 2 or 3 variants (so many don’t even have italic bold and some programs will fake it). While level 7 is a possibility I think for people who want to consider using small caps as another kind of emphasis they will just stick with the choice of italic, bold, small caps but not together at once. But I agree typing 4* is kind of bad. On the other hand that’s what we already need to deal with in headings (but again agree that should be less common.)

In practical perspective I don’t really oppose to ^^ or any other valid construct. My preference is more about having one instead of which one. And from pandoc history it seems like settling on which syntax is hard. Ie if we all agree we should have one, count me in on the ^^ syntax and so far we have no one against this (even has support from @jgm as quoted above.)

Minor edit: ****__* could help a bit by mixing the 2 different characters for same meaning. An interesting thought would be to ask if *_*_*_* should be allowed.

bpj commented 4 years ago

Yeah. Linguists are possibly the only people in the universe to use bold small caps but the fact is that they do sometimes use bold small caps for semantic tags and regular small caps for grammatical tags, and then at least theoretically italics or underlining on top of that for emphasis. Things like "dog.ACC" where the caps are actually small caps are not all that uncommon (meaning "note that 'dog' is actually in the accusative case here" or "this accusative agrees with the other accusative in the same prase"). Underlining is actually more common than italics in these cases, but GitHub refuses to render my <u> tags.

ickc commented 4 years ago

I see. While not a Linguist, I have read some work that has really a ton of "modifiers" (not sure if it is the correct term.) Examples are the well known BDAG Greek dict. or to lesser extent any kind of interlinear texts. I agree Linguist really has a special need in "encoding" many different kind of information using typography. In this sense I think having a Linguist's opinion on something like is is important because they see more cornering cases than others.

(As an aside I wish I only need to worry about typesetting typography because I really hate typesetting graphs/diagrams/plots... Those are so varied and hence difficult to master.)

brainchild0 commented 4 years ago

There is a very frustrating pattern in this discussion, in which an idea is given, based on assumptions, then those assumptions are challenged critically, then the assumptions are simply repeated, then a reminder is given that the assumptions were already addressed, then the assumptions are simply repeated again (and again, and again).

The long list of "facts" given by @bpj are not facts, but rather are assumptions that have never been justified and largely have been challenged, even repudiated. Simply repeating the same assumptions gives them no greater importance or usefulness, except as a means to obscure clarity and to obstruct progress.

What is even more alarming, however, is the way in which core design principles of MarkDown, which @bpj has somewhat inaccurately and dismissively labeled "philosophical considerations", are summarily pushed aside, rather than acknowledged as being the very cause by which MarkDown has become so useful in a large and growing number of cases.

Once a language seeks to represent every possible physical typographical feature, it stops being a semantic language, which MarkDown is, and becomes a typesetting language, which other languages already are and which MarkDown specifically is preferred for not being.

A very good topic of discussion, I feel, would be better ways to support small caps in Pandoc output. I think that such a conversation would help to achieve the goals that underlie this issue, while harmonizing the design of the MarkDown lanugage.

brainchild0 commented 4 years ago

Also, for a better understanding of how your ideas relate to the broader MarkDown ecosystem, consider reading the CommomMark specification document, if you have not yet done, or opening a topic in the CommonMark discussion forum.

Note that CommonMark and Pandoc were spearheaded by the same individual.

Also try to count the number of occurrences of either the term italic or bold in the specification, and consider how many times you think that the term small caps should likewise appear.

ickc commented 4 years ago

@brainchild0, May be the pragmatic way to address your question is that, well, the semi-truth is that none of our opinions really matter. The core developers will decide. I'm sure they will unbiasedly weight in all of our opinions, perhaps plus some other factors inside their head that they might not have time to fully explained.

Or may be we could also say if the matter isn't controversial it wouldn't have taken that long for it to be still undecided (i.e. not implemented, but not closed either.) So I'm not surprised to see someone strongly for it and against it. In the end it is someone else' judgement call to decide one. And either way there's going to be people not completely satisfied with it (native div! Anyone?)

And among the spectrum between CommonMark and pandoc, they are arguably on 2 opposite ends. I once made the flaw of trying to put them on equal footing and I admit I like the pandoc community more than the CommonMark community. Luckily both projects are heavily invested by @jgm and pandoc also has the CommonMark reader/writer (and will have a new pandoc reader/writer that is improved but incompatible with the current one. The current one will remains as another markdown variant to use so it won't be deprecated at least in the short run.)

As I have digressed I think we all need to celebrate the effort @jgm and others have put in unifying the "markdown communities". At least they are now more in agreement with each other than in the past. Markdown's fragmentation is the main reason for people not to adopt it. So either way I think the more we put away our differences the better for the markdown community in general (even if that means the language might be less perfect than one might want.)

brainchild0 commented 4 years ago

none of our opinions really matter

Opinions don't matter, correct, but ideas do. If a solution is based on good ideas, it might succeed. Otherwise, it won't last long. That's why it's frustrating when others address your comments but you sidestep their analysis.

among the spectrum between CommonMark and pandoc, they are arguably on 2 opposite ends. I once made the flaw of trying to put them on equal footing

Ends of what spectrum? CommonMark is a text mark-up format specification, intending to resolve the problem of pre-existing dialects of MarkDown, aimed at ensuring consistent processing following a deterministic rule set. Pandoc is a software application for document format transformation and translation. They're no more on equal footing, or opposite ends of something, than Opera and HTML, or than bread and toasters.

Markdown's fragmentation is the main reason for people not to adopt it.

Lots of people adopt MarkDown. The user base is growing. It diverged into dialects, not fragments.

Why did you adopt MarkDown? What are you trying to achieve with it? Are you frustrated because you want it to be something different than what it is?

Really, I hoped that by exploring CommonMark, you might gain an understanding of MarkDown's core design principles. If you are determined that they don't exist or don't matter, then you won't learn. But if you're able to move away from giving opinions and toward discussing ideas, then you might feel less like what you say does not matter.

ickc commented 4 years ago

I hoped that by exploring CommonMark, you might gain an understanding of MarkDown's core design principles.

If you dwelt too much on CommonMark's design principle then you might find pandoc's design principle doesn't quite fit you. (I'm not claiming they are completely different either. There are rarely such extremes.) The 2 communities may be more different than you might imagine.

That's why it's frustrating when others address your comments but you sidestep their analysis.

You could try to think in whatever way you want, but I think both of us has mentioned their points enough. In the spirit of "let's agree to disagree" there's really no point on one insisting their own point of view. And as I don't have authority in deciding which direction pandoc goes so I think you should think in a broader term of appealing your reasoning/idea to the general audiences or core developers. Again I don't think they are going to ignore your opinion/idea.

Lots of people adopt MarkDown. The user base is growing. It diverged into dialects, not fragments.

We're not in disagreement here. I mentioned the situation is improving. And you claimed so too. Right?

brainchild0 commented 4 years ago

In the spirit of "let's agree to disagree" there's really no point on one insisting their own point of view.

If you dwelt too much on CommonMark's design principle then you might find pandoc's design principle doesn't quite fit you.

The 2 communities may be more different than you might imagine.

Agree that arguing is only useful when it proves productive.

Doubtful more discussion would be useful, except that your idea that CommonMark and Pandoc are built on conflicting design principles, not simply distinctive objectives, might be one area that hasn't been explored.

I identify relevant extracts from the central documents of each, and fail to find support for this conclusion.


From the CommonMark specification:

This document attempts to specify Markdown syntax unambiguously.

...this document describes how Markdown is to be parsed into an abstract syntax tree...


From the Pandoc manual:

Pandoc is a Haskell library for converting from one markup format to another, and a command-line tool that uses this library.

Pandoc’s enhanced version of Markdown includes syntax for tables, definition lists, metadata blocks, footnotes, citations, math, and much more.

Markdown is designed to be easy to write, and, even more importantly, easy to read:

A Markdown-formatted document should be publishable as-is, as plain text, without looking like it’s been marked up with tags or formatting instructions. – John Gruber

This principle has guided pandoc’s decisions in finding syntax for tables, footnotes, and other extensions.

There is, however, one respect in which pandoc’s aims are different from the original aims of Markdown. Whereas Markdown was originally designed with HTML generation in mind, pandoc is designed for multiple output formats. Thus, while pandoc allows the embedding of raw HTML, it discourages it, and provides other, non-HTMLish ways of representing important document elements like definition lists, tables, mathematics, and footnotes.

brainchild0 commented 4 years ago

Although I disagree with the premises in your argument, @bpj, for providing, in MarkDown language definition, support for small caps, or any other typographic feature, the cases you have requiring specific typographic styles in your output are real and valid.

Could you provide a short, representative segment of text, along with your requirements for output formatting?

Moving this discussion toward how to support use cases for varied typographic output and away from why MarkDown should be a typographic language will do everyone enormous good.

ickc commented 4 years ago

except that your idea that CommonMark and Pandoc are built on conflicting design principles

I didn't say that. But the culture of the 2 communities are very different. You'll see more when you dwell in pandoc's long enough. Don't forget that @jgm is deeply involved in both so that alone speaks about their similarity too. But the decision process is quite different. CommonMark is spec driven and focus on core, common, features, i.e. a highest common factor kind of minimal sets of feature that can be strongly spec'd but pandoc given the history is much more extensive. Also in pandoc's design often the internal AST's design has some influence on what features might be accepted because AST changing kind of feature requests are very difficult to implement. And the AST kind of set up the "philosophy", the way of thinking about the structure of the document. That's why you find that some people telling you there's already a SmallCap AST element exist in pandoc and hence the request is really just about mapping that to a markdown-ish syntax.

The "2 ends" of the spectrum between CommonMark and pandoc is partly due to that nature. Consider CommonMark as the intersections of all markdown and pandoc is, sort of, the union of all markdown so the "entry barrier" to either is very different as one really need to think very clear about adding something to CommonMark as it really has to be a core set of features (last I check there is a certain feature that some consider very vital to markdown that's not even considered in CommonMark at least pre v1.0 because of that reason.) But for pandoc it is more like "if it make sense", there's a lot more grey area so unfortunately it is really difficult to say what exactly is the criteria (I'm sure @jgm can say more clearly about this but I'm also quite sure he won't be able to just "give a definition" in the sense that once these criteria are fulfilled a feature should be accepted or a certain thing is a good choice.)

Also note that one of the "selling point" of pandoc is the internal AST plus the filter system so to some extent you can alter pandoc's behavior by filter. Many people has their own way to deal with this kind of problem already (i.e. a syntax for SmallCaps) so some feature requests like this is more about consolidating people's need and experience to something in common so that people don't need to reinvent the wheel so to speak. (Also related to reproducibility of documents if relying on filter.)

I guess I just couldn't see why the existence of a pandoc markdown syntax of small caps would harm the markdown community in general. Pandoc's markdown is already much more general than many other markdown variants so that it is unreasonable to think eventually other markdown are going to pick up all the features available in pandoc markdown. Also, don't forget that pandoc has a few independent markdown reader/writer pairs (pandoc, new github based on CommonMark, CommonMark) and they all has their own "spec" of syntax to support. If you consider adding a SmallCaps syntax to pandoc markdown some sort of pollution, don't worry, it is unlikely to spread to other markdown variants, because there exists a long list of incompatible feature in pandoc not supported in other markdowns.

brainchild0 commented 4 years ago

If you consider adding a SmallCaps syntax to pandoc markdown some sort of pollution, don’t worry, it is unlikely to spread to other markdown variants, because there exists a long list of incompatible feature in pandoc not supported in other markdowns.

When discussing, in the Pandoc issues tracker, features that might be polluting, then I’m not worried about the pollution harming other MarkDown variants, of course, only about the pollution harming Pandoc.

Consider CommonMark as the intersections of all markdown and pandoc is, sort of, the union of all markdown so the “entry barrier” to either is very different as one really need to think very clear about adding something to CommonMark as it really has to be a core set of features (last I check there is a certain feature that some consider very vital to markdown that’s not even considered in CommonMark at least pre v1.0 because of that reason.)

To be clear, you’re using Pandoc to mean the Pandoc MarkDown reader and extensions. It helps to give these clarifications in a way that your comments can be understood at the time you make them, instead of long after.

Indeed, Pandoc extends MarkDown beyond the basic dialect, but so far these extensions embody advanced, but yet semantic, features of a document. The reason I invited you to explore CommonMark was in part to help you understand the core design, but also because earlier you stated that your proposal somehow adds symmetry in the treatment of small caps with the current treatment of italics and boldface.

If those conclusions were accurate as applied to Pandoc MarkDown, then wouldn’t they be so applied to CommonMark, and supported by the premises presented in the specification? Would you expect the CommomMark community to be enthusiastic about your idea? If so, why not present it to them? If they accept it, then surely the Pandoc community would do the same.

Or if you expect that they would not to accept it, then why should the Pandoc community accept it? Do you think that the current capabilities of the Pandoc extensions include every feature anyone has ever conceived for any variation of MarkDown? Do you literally consider them to be an intersection of all MarkDown? What do you think should be the “entry barrier” for support in Pandoc, or do you think there should be none?

ickc commented 4 years ago

To be clear, you’re using Pandoc to mean the Pandoc MarkDown reader and extensions. It helps to give these clarifications in a way that your comments can be understood at the time you make them, instead of long after.

I think it is what pandoc means is implicit, and should be clear from the context. It is not rare for a same word meaning multiple things at the same time. Someone as smart as you should be able to observe that in the world where a rigorous definition is rarely given.

Or if you expect that they would not to accept it, then why should the Pandoc community accept it? Do you think that the current capabilities of the Pandoc extensions include every feature anyone has ever conceived for any variation of MarkDown? Do you literally consider them to be an intersection of all MarkDown? What do you think should be the “entry barrier” for support in Pandoc, or do you think there should be none?

Historic precedence can answer you that. I forgot which common markdown feature was rejected as it is not considered to be a core feature of markdown (e.g. the original markdown from 2004 does not have that concept), but at least there's one example there already (it is also obvious that given how limited features/extensions exist in CommonMark.) There are many example in pandoc that happens after CommonMark appears and in the decision process it is obvious that might not be considered in CommonMark but is implemented anyway. The biggest such example might be native div.

Just in case you want to continue the discussion, I'm unsubscribing this issue. So feel free to continue to give input here but I don't have the time to devote to this issue anymore. I don't think you can expect others going to spend time writing out careful essays that is logically clear and without flaw, etc. Frankly all replies I have here is a little gap of time I got and just type as I think without even have the time to reread myself. I don't think it is effective either (to do the above.) But feel free to devote your time to it. Just don't expect other people are going to devote as much as you might want/do.

bpj commented 4 years ago

Since the list of facts I gave are easy to check, especially the most important ones that people do use a third kind of emphasis and that misnomers in computer programs usually aren't corrected, I wonder what evidence you @brainchild0 might adduce to "refute" them? What's the point of insisting that a third kind of emphasis would be some kind of pollution when people in fact do use more than two kinds of emphasis in the real world?

In the discussion on pandoc-discuss I posted several links to real-world examples of linguists using small caps, including an appendix full of bold small caps — arguably a bit over the top but clearly motivated by the fact that the entities of meaning are in bold small caps elsewhere in that book — and a discussion about the rationale for using small caps for etyma in Romance Linguistics.

https://groups.google.com/d/msgid/pandoc-discuss/74a86783-3ed0-7fef-11fd-ac2c0ce5b14c%40gmail.com

Note that neither Pandoc nor CommonMark is about some Platonic ideal "pure" Markdown. Pandoc is about providing solutions to real, practical needs writers have in the real world which markdown.pl didn't cater for, while CommonMark is about providing a least common denominator among Markdown flavors, not about legislating against whatever extensions the authors and communities of various flavors see a need for. The design criteria for Pandoc's Markdown flavor and for CommonMark are different but there is no conflict, just a difference in purposes, which are equally valid in their respective domains. The fact that @jgm is the driving force in both projects is evidence of this. If he doesn't see a conflict neither should we. If you feel that Pandoc's extensions are a pollution and CommonMark is ideal you are free to use --from commonmark and --to commonmark at all times, just as others are free to use such extensions as suit their use case. The only person trying to impose here is you. I and @ickc are just pointing out that there are valid, practical, real-world reasons for Pandoc's design criteria, and goals, to differ from those of CommonMark. If you dislike some Pandoc extensions you are free to use --from markdown-<extension> as you see fit. You are perfectly free to have a narrow view of Markdown, but you won't have any success in imposing your narrow view on Pandoc, since @jgm, whose opinion is the important one in the end, doesn't share it. He has repeatedly shown that his main criterion for including or not including any extensions is the potential size of their user base weighed against the ease or difficulty of implementing it. If you don't like that you are free not to use Pandoc's Markdown flavor, but let those who like these criteria propose what we think are improvements as we damned well please, to be accepted or rejected by @jgm as he sees fit. He implemented Pandoc's modular extension model, so clearly he favors freedom of choice without imposing or denying any extensions to Pandoc users. Everyone uses or doesn't use the extensions they want at their discretion. Now that we have the default file feature it is very easy to enable or disable the extensions one does or doesn't want without bothering or being bothered by others with different preferences. Just put alias ideal-pandoc='--defaults ~/.pandoc/defaults/ideal.yaml' in your .bashrc and you have your ideal and I will have mine, thank you very much @jgm!

The Pandoc community, including @jgm himself doesn't have, need or want any Markdown Purity Police.

If I have to choose between a "Platonic" "pure" Markdown for the world of rarified ideals and an "Aristotelian" Markdown for real world needs I'll always choose the latter, as I'm sure will the majority of Pandoc's users and its developers.

Unlike @ickc I won't unsubscribe this issue because it is important to me, but I will mute the email thread as soon as I have hit Send.

.

brainchild0 commented 4 years ago

Since the list of facts I gave are easy to check,

You gave a lengthy list of assertions, which would become facts once they would be verified.

especially the most important ones that people do use a third kind of emphasis

How would we verify that "people do use a third kind of emphasis"? Which people? How many? How often? How many people doing something would be needed for you to consider it prudent to change a language?

and that misnomers in computer programs usually aren't corrected, I wonder what evidence you @brainchild0 might adduce to "refute" them?

Assuming you are referring to symbol names that have misleading names but are kept for reasons of interoperability and code management, I would generally agree. I have no idea why that observation would be involved in a discussion of MarkDown language elements. I'm not sure I follow you on the meaning of this statement but I definitely don't follow you on the relevance.

In the discussion on pandoc-discuss I posted several links to real-world examples of linguists using small caps, including an appendix full of bold small caps

Sure. I have no dispute whatsoever with linguists using small caps with bold, without bold, or in any way. But if you have a semantemes named "Foobar", and you have a language like MarkDown which is semantic not typographic, then wouldn't you think that the appropriate way to express it would be [Foobar]{.semanteme}, and then to format the class semantemes as small caps, with bold, without bold, or whatever you want?

The design criteria for Pandoc's Markdown flavor and for CommonMark are different but there is no conflict, just a difference in purposes, which are equally valid in their respective domains

@ickc are just pointing out that there are valid, practical, real-world reasons for Pandoc's design criteria, and goals, to differ from those of CommonMark.

Partially agree. Overall the purposes are the same but with slightly different use cases which inform the details. The Pandoc extensions support more advanced constructs as well as the publishing of standalone documents. Both adhere to the fundamental principles of markdown.pl. Consider that an Audi A8 and a Honda Civic have noticeable differences in design goals, as well as noticeable similarities. Both are motor-vehicles and both meet regulatory requirements in the countries where they are sold. Both have four wheels, have an internal combustion engine, and so on. Someone with limited knowledge of cars would still recognize either as belonging to that class. If you're lucky enough to be able to afford an Audi then you might enjoy the leather seats but you won't use it fly to Mars or to cook breakfast. Both are cars and try to be nothing else.

Note that neither Pandoc nor CommonMark is about some Platonic ideal "pure" Markdown.

I think mostly everyone supports the goal of a balance between a Platonic ideal and a polluted mess.

But if a diverse group of users are formed into groups that each want their own particular change adopted into the language, then isn't adhering to a limited but cogent set of design principles the best way to strike that balance and to prevent an unmanageable number of mutually-inconsistent features from all being adopted for inclusion?

Platonic ideals might a red herring. I suggest discussing the concrete issues.

Pandoc is about providing solutions to real, practical needs writers have in the real world which markdown.pl didn't cater for

Agree. I definitely hope that Pandoc gives you a practical solution to your problems, but I haven't been convinced that the particular solution you suggest is the only or even best way to make Pandoc more useful for you or everyone else. If you want to discuss specifically your problems, then it would be a good discussion. Mostly the needs I have seen in your comments might be more directly served by other means than making a small caps extension to the MarkDown language.

If he doesn't see a conflict neither should we.

I don't have have any information about who sees a conflict or does not on this issue except for the comments appearing here. More importantly, I completely fail to understand your idea. What is the difference who sees a conflict? Isn't a conflict a conflict, regardless of who sees it, or who doesn't? Isn't an elephant an elephant, regardless of who sees it, or who doesn't?

If you feel that Pandoc's extensions are a pollution and CommonMark is ideal you are free to use --from commonmark and --to commonmark at all times, just as others are free to use such extensions as suit their use case.

I never said anything remotely similar to that. I have commented on a single idea which is currently included neither in CommonMark or Pandoc.

f you don't like that you are free not to use Pandoc's Markdown flavor,

I like it and use it, which is why I care about its future.

but let those who like these criteria propose what we think are improvements as we damned well please, to be accepted or rejected by @jgm as he sees fit.

You did propose an idea, and others gave their ideas. What am I preventing you from doing? Do you think I have some means to control @jgm, or even to influence him beyond what I write in this system?

He implemented Pandoc's modular extension model, so clearly he favors freedom

True, but consider if an extension were available for everyone who requested one. The language syntax would become overloaded. Some extensions would be mutually incompatible, with output being underdefined whenever too many were enabled. Source documents would produce radically unpredictable results except when invoked with precisely the intended mixture of extensions.

of choice without imposing or denying any extensions to Pandoc users. Everyone uses or doesn't use the extensions they want at their discretion.

Also true, but unless the set of extensions is somewhat limited, and they consistently adhere to a set a principles, the languages loses its value as a means of sharing information among one another.

If I have to choose between a "Platonic" "pure" Markdown for the world of rarified ideals and an "Aristotelian" Markdown for real world needs I'll always choose the latter

Are there only two options for the entire language? Isn't it best to evaluate choices case-by-case?

alerque commented 4 years ago

Hear hear to what @ickc and @bpj have taken the time to type out. Many of us want to see this happen because we have pragmatic needs for it in the field. The discussion is derailing what this issue needs to sort out; @brainchild0 can we please move the philosophical discussion of what you want MarkDown to (not) be to a different venue? You are free to not use any syntax you don't see a need for.

The unresolved bit of this issue is whether to go with ^^Small caps^^ or ****Small caps****.

I lean towards the ^^ option (even though it will break my key bindings requiring retraining muscle memory) because I think it will be the cleanest and most understandable to end users, but am sympathetic to the argument for dizzying layers of asterisks.

brainchild0 commented 4 years ago

we have pragmatic needs for it in the field

Understand. I have in common with you a wish to be able to publish documents, generated by Pandoc, that have small caps. It would really be helpful if you explained simply the need, separate from what you have apparently decided is the best or only way to realize that need.

alerque commented 4 years ago

I never said a native syntax is the "only" way to realize a need. What we're all saying is it is better for our use cases. Again, you don't have to use it.

As the production manager for a publishing company using Markdown as our canonical source I train and coordinate a team of writers, editors, and translators. Teaching them Markdown is not always easy, but it is workable — but the farther what they have to type differs from the way they think about their content the harder it gets. Having a terse native syntax will make life easier for me and them because it will be less typing, less errors, more readability, etc. Using spans and classes is verbose and breaks the readability of the underlying prose.

One of the reasons Markdown works in the first place is because the underlying content remains readable in a natural way even with some markup in place. Consider:

This is *my* test!

vs.

This is [my]{.emphasis} test!

Which one of those is easier to type, read, copy edit? When reading and copy-editing, which one of those breaks your eye's flow over the text? Which one adds visual emphasis without distracting you from reading the sentence? Or another one:

Once upon a time, somebody said:

> Hello world!

vs.

Once upon a time, somebody said:

::: {.blockquote}
Hello world!
:::

Either way would get the job done. One is a lot nicer to use.

What I'm saying (and I hear from others here and elsewhere) is that not having native syntax for something that is commonly used in conjunction with other forms of emphasis that do have native syntax is awkward and an unnecessary barrier. The Pandoc AST already has native support and lots of other markup formats have native support. We think having native support in Pandoc's dialect of Markdown via an extension will be a good thing.

You can disable the extension and go on using spans and classes if you want.

brainchild0 commented 4 years ago

I tend to understand your dilemma and frustration, @alerque, reading your explanation, but I am also confused about what you and your company is trying to achieve and how you are trying to achieve it.

One remark that particularly strikes me is the below:

the farther what they have to type differs from the way they think about their content the harder it gets

I have read a large number of blog articles, forum posts, and similar sources over the past months relating to MarkDown as well as Pandoc. Many source explain the benefits of the paradigm, as perceived by the authors, and are targeted at others within the same field, encouraging them to join the movement away from traditional, cumbersome, consumer editing tools. The reasons for their enthusiasm, in my estimation, tend to echo the core principles of MarkDown, which are that the semantic features of the document are give in a human-readable, tool-agnostic, durable, portable, reusable representation. The semantic distinction extremely valuable to writers. If writing “My favorite novel is A Tale of Two Cities”, then it is particularly useful that the fact of the book title being demarcated from the surrounding text can be achieved readily, immediately, almost without thought, and completely without the particular manner of that ultimate representation needing to be considered. Someone might at some time consider that the best typesetting for the document is that the emphasized titles appear in small caps, despite the contemporary norm for italics. This decision can be made at the time of any particular publication, without affecting the original source material that might be published differently elsewhere or later, or encumbering the writer at the time of generating the text during ongoing revisions subsequent to the first publication.

For writers generally, a single variety of emphasis is needed. Practically, italics has been the most common style of text that is set apart from Roman body text, because it is available in faces that coordinate with most Roman types, and because, lacking a difference in weight, it is not immediately noticed simply from looking at a page, only seen as distinct from reading through the paragraph that contains it. Text in books, articles, and other formal sources does not generally employ multiple kinds of emphasis. In such formal typesetting, variation in font weight is normally reserved for entire blocks that are set apart from other blocks by whitespace. Examples include titles, headers, words for definitions, and so on. These distinctions would historically be resolved by the typesetter, not the author. Notice that LaTeX includes only a single semantic macro for emphasized text, namely \emph. No \strong macro is normally defined, though nested emphasis is treated specially.

Practically, authors need a singe kind of emphasis, but MarkDown provides two, partly owing to the history of desktop computer font families including italics, boldface, and bold-italics varieties, and partly to facilitate the writing of informal snippets of text when full typesetting features are unavailable.

Naturally, special requirements appear in technical writing, which cannot be practically resolved by a dedicated symbol for each possibility. The need arises for native spans, marked by class names. While cumbersome to provide a class name each time a semantic feature occurs, neither word processors nor other tools can feasibly offer a more convenient option. The benefit remains, however, in this case, as in all others, that the author needs only to think of what is being written, not how it might eventually be represented.

Considering the apparent enthusiasm for this semantic paradigm, I am troubled that its defense earns titles here such as Purity Police and becomes dismissed as a philosophical discussion. These comments look to me, whatever their intention, as an attack by handful of zealots on the principles that have built not only a set of tools, but also a movement among writers who have finally found a tool that gives them the means to express themselves free from distraction.

Returning to the beginning, what puzzles me particularly about your comment, @alerque, is that you suggest that when writers think about their content, they are in fact thinking about their formatting. These two features are distinct, and the value of a semantic language is to enforce that distinction. If document authors for some reason, which I would not understand, prefer to think about their articles in terms of formatting, then it would seem natural that they use a typesetting language, not a semantic one.

I cannot comment on why MarkDown is being used in the case you describe, and I realize that no company can switch to another convention without tremendous costs, but my first impression is that MarkDown is a poor match for your needs. The problem you describe, as I understand it, brings me to my very original comments, that the effect being requested is indeed to make MarkDown, at least some variation of it that might be enabled by some Pandoc extension, to stop being a semantic language and to start being a typesetting language. If you want a language to represent italics, boldface, small caps, and so on, then shouldn’t you select a language that already does represent these typographic features, or use a word processor? What are you trying to do with MarkDown, separate from what MarkDown is intended to do? Or perhaps you can develop some way to help your writers to think semantically, the same as the writers who defend MarkDown, or rather to realize that they are already thinking semantically, since ultimately text formatting is merely an artificial device for communication.

Simply that I am taking the time to write these considerations in the issue tracker, is not to say that many more who lack the ability or inclination do not share my same sentiments. To you it might appear that a single vigilante is blocking a much needed feature, but to me it looks as though a handful of zealots are trying to alter the very features that have given MarkDown life within a diverse community of supporters.

alerque commented 4 years ago

Stop telling me and the rest of us how to do our jobs or what we do and don't need. Not only the following line but large chunks of what you just wrote are balderdash.

Practically, authors need a singe kind of emphasis, but MarkDown provides two, partly owing to the history of desktop computer font families...

No. Desktop computer fonts inherited these forms of emphasis from print, where there were in user long before "fonts" existed. Small caps was also much more heavily used during the era of the printing press — as a form of emphasis.

You may only need one kind of emphasis, but practically speaking many use cases need more. Others have detailed use cases above. You are free to only uses one if you want. Stop using bold for all I care, but don't try to take it out of our toolbox.

My use case is not ever for small caps per-se, many of my books have other ways of typesetting emphasis. The issue here is that I need a third kind of emphasis. Lots of us do. Small-caps is common way to format this on the typesetting end of things, but what I need is the input end that semantically marks up a different class of emphasis than * or ** are used for. Sure it could be abused, but there are detailed reasons above explaining why some authors need this.

Many other formats support this. Pandoc supports it internally. All we're saying is that it would make authors' lives easier if their input format supported it natively as well.

I don't want to do this because you've contributed good things to other discussions, but if you don't start leaving this issue alone I'll probably end up blocking you on Github. I'm tired of the pedantic attempts that basically suggest the rest of us don't know anything about our respective fields and needs. Take it elsewhere. Mailing lists work better for this kind of discussion. If and when we get an extension in Pandoc for it's dialect of MarkDown you are free to not use it --- just like all the other features Pandoc supports that other markdown tools do not. Footnotes anybody? Just because not everybody will use it doesn't mean we can't have nice things. We're not breaking MarkDown for everybody else by adding something that will conflict with their existing usage of it.

jgm commented 4 years ago

I think this discussion is getting unproductive: everyone has said what they have to say, and it's probably best just to stop here. I agree with @alerque that long philosophical discussions are better carried out on the mailing list.

I don't like any of the syntax suggestions for small caps that have been proposed.

@alerque if you need a third form of emphasis that is convenient to write, one option would be to use a filter to overload something that already exists: for example, the Strong [Emph _] combination, or even Emph [Emph _] (written _*hi there*_). Or you could write a filter that made *!This is small caps!* work for small caps.

bpj commented 4 years ago

@alerque

function Strikeout (elem)
  return pandoc.SmallCaps(elem.content)
end

works well if you don't have a need for strikeout, except that if you omit the filter the results (in LaTeX) look idiotic, and that there is a segment of people who already are used to thinking of ~~foo~~ as strikeout, and I also frequently need underlining, i.e. four kinds of emphasis.

Also _*Foo*_ and __**Foo**__ and their inverses *_Foo_* and **__Foo__** although incredibly ugly and confusing give you [Emph [Emph [Str "Foo"]]] and [Strong [Strong [Str "Foo"]]] in the AST which are easy to catch with a filter:

function Emph (elem)
  if #elem.content == 1 and 'Emph' == elem.content[1].tag then
    local content = elem.content[1].content
    return -- wrap the content in something
  end
end

Unfortunately @brainchild0 have achieved their goal: because of the heat their rants and attempts to, in the words of @alerque, tell us how to do our jobs and what we need the chances that we will get a native syntax for a third kind of emphasis have become even worse than before.

@jgm for the record I wish _…_, *…*, __…__ and **…** were four different AST elements, say EmphA, EmphB, StrongA and StrongB. The standard writers should of course continue to treat the A and B alternatives as equal, and an Emph or Strong function in a Lua filter would still handle both, but people could use filter functions with the suffixes in their names to give them different semantics if they need and want to. That would defuse much of the trouble with finding new syntaxes for different kinds of emphasis, insertions and whatnot, without troubling those who are dead against overloading any more (combinations of) characters. They would simply not use filters which treat underscores and asterisks differently and could ignore the possibility.

bpj commented 4 years ago

Also I can and do use […]{.sc}, […]{.sf} etc, but as @alerque said such things break your train of thought and impact readability negatively. I started using Markdown and Pandoc to get away from that in LaTeX and HTML.

brainchild0 commented 4 years ago

For the record, @jgm, if what is being sought, or what could be accepted, is simply a more general way to use existing AST types and MarkDown syntax and principles, then it might not be unreasonable to introduce an extension that will resolve a four-asterisk or four-underscore demarcation as doubly-nested strong-emphasis, and more generally, a demarcation of N consecutive characters, all asterisks or all underscores, as floor(N/2) depth of nested strong emphasis surrounding N mod 2 depth of regular emphasis.

Then ****foo**** is equivalent to **__foo__**, and *****bar***** to **__*bar*__**.

Notice of course that the base cases of N < 4 evaluate to their current static definitions, such that no case is special.

Mixed sequences of asterisks and underscores would further be evaluated by the same rules, separately taking each nested demarcation of the same character. Thus *****__baz__***** is equivalent to **__*__baz__*__**, because the outside five-asterisk sequence resolves to a sequence of two-two-one, the same as the similar sequence, above, that surrounds bar.

Naturally not everyone will like this solution, including myself, and probably no one will greatly like it. Yet, presently sequences of four or more asterisks or underscores are taken verbatim, and it would seem unlikely that unescaped asterisks or underscores, already defined as emphasis demarcation in consecutive sequences of fewer than four, would reasonably be given a different meaning when appearing in longer sequences.

The use of alternating sequences of asterisks and underscores, already supported, is disliked by @bpj; use of extended sequences, currently proposed, is disliked by @alerque; and use of a different character, though more compact and legible, is disliked by @jgm, and unlikely popular to a broader base.

There is no answer that satisfies everyone’s preferences, but ultimately placing value in common principles instead of individual preferences is needed to facilitate progress.

The proposed feature merely generalizes the use of existing types and processes without challenging established principles. To the inevitable dismay of some, it lacks support for arbitrary cascades of regular and strong emphasis, and neither adds new nodes to the AST, nor adopts new uses of existing nodes. It also excludes a myriad of other possibilities that the human mind can imagine. It is a modest idea with modest benefits, while keeping concepts true to their origination.

jgm commented 4 years ago

commonmark already defines the behavior of long sequences of asterisks, and I would like to move closer to, not farther from, commonmark behavior. Indeed, at some point I plan to cut over to a commonmark + pandoc extensions parser I've been developing (leaving the current markdown parser available for legacy purposes).

brainchild0 commented 4 years ago

I see, good, so currently CommonMark is currently specified the way I described except that any extra regular emphasis span sits on the outside, which is a difference with the Pandoc default parser for the base case of three characters.

I missed this capability when I looked at the CommonMark specification. I might put a suggestion in that repository. Currently the document uses the term "multiple parings" in exactly one place, but never gives a definition. I'm not skilled at reading specifications, but I sense that one of the targets for this document is meant to be casual users who want a sense for the syntax and features but aren't necessarily looking to assimilate the exhaustive set of rules.

cysouw commented 3 years ago

Ignoring this interesting discussion, and coming back to the OP: I also regularly use small caps and find the markup in Pandoc Markdown rather cumbersome. However, my solution is to use the strikethrough markup (I never use strikethrough) and then use the following lua script to change strikethrough to smallcaps. (Note that it also removes capitals, because I don't like capitals in small caps). This of course doesn't solve this discussion, but it might solve the OPs problem :-).

-- use strikeout markup ~~ for smallcaps
-- Copyright © 2021 Michael Cysouw <cysouw@mac.com>

function make_lower (s)
  return pandoc.Str(pandoc.text.lower(s.text))
end

function turnToSmallCaps (elem)
  elem = pandoc.walk_inline(elem, { Str = make_lower } )
  return pandoc.SmallCaps(elem.content)
end

return {{
  Strikeout = turnToSmallCaps
}}
ickc commented 3 years ago

Thanks, but it does not (solve my problem.) Because there's always solutions that already works, the point here (and in the pandoc-discuss quoted) is more about aesthetic (to pursue markdown-ish syntax to all pandoc AST elements so to speak), and perhaps semantics (if you hack through another ast element then you permanently change the meaning of the original document as is, but markdown's philosophy is readable as is.)

And the real issue at hand is to find a truly markdown-ish syntax that works. I'm surprised to get this message and see that it has been only opened for 2 years. Given enough among of time probably it will happens. (See how long it takes for native div and span got markdown-ish markup.) Unfortunately the blocking thing here is that we've running out of ideas, that none of the proposed ones satisfies @jgm's high standard (which is good.)

P.S. for filter approach I'd prefer mapping smallcap to small-scrap because it is semantically correct (in the sense that it conveys a higher level of emphasis, which is by the way most of the argument I saw because people think that is a font choice, not an emphasis. But really, what is the meaning of choosing that font?)

aarek-eng commented 3 years ago

I just want to be able to quote Death of Discworld. You may think this is silly, but there's a lot of good stuff to quote from that character, and I often find myself frustrated that I can't do so in the "proper" way.

On TVTropes, they use [[AC:Text]] to generate small caps. It's not very native, but it does get the job done. They also have other features - is that just the PMWiki markup language, or is there more to it?