Closed bugabinga closed 5 months ago
Hi and thanks for your interest in mdslw
!
The markdown parser used in the back is pulldown-cmark
, which parses CommonMark with some extras. Yes, I would say that mdslw
aims to support CommonMark at the least. However, I am very happy to support other extensions out there as long as they a) don't mess with the default CommonMark support, and b) are not too horrible to implement. Unfortunately, it does not look like pulldown-cmark
supports definition lists.
Looking at definition lists, I'm wondering how best to implement detecting them. Basically, for every new block of text, mdslw
would have to check whether all lines apart from the first start with a colon while the first one has to be just text. As soon as one other line does not start with a colon, the block cannot be a definition list, right? Do you know of a simple reference implementation that could be helpful?
Furthermore, what should mdslw
do in case some lines in the definition lists are longer than the desired maximum line width? It should probably keep them as they are because, once wrapped, they would no longer be definition lists, right?
In any case, I would consider this an optional feature that could be enabled. What do you think? Any feedback is appreciated :smiley:.
Looking at definition lists, I'm wondering how best to implement detecting them. Basically, for every new block of text, mdslw would have to check whether all lines apart from the first start with a colon while the first one has to be just text. As soon as one other line does not start with a colon, the block cannot be a definition list, right? Do you know of a simple reference implementation that could be helpful?
This sounds like it would get messy really fast, especially when other extensions come along. The parser I am using in this case seems to be goldmark via the glow command line tool. https://github.com/yuin/goldmark/blob/master/extension/definition_list.go Definition Lists seem to be one of the extensions enabled by default there.
Furthermore, what should mdslw do in case some lines in the definition lists are longer than the desired maximum line width? It should probably keep them as they are because, once wrapped, they would no longer be definition lists, right?
This is an excellent question! As far as I can tell, the idea is that, similar as with bullet lists, indentation via spaces marks a paragraph. https://michelf.ca/projects/php-markdown/extra/#def-list
While thinking about this some more, I noticed that my issue arose from the fact, that mdslw
not only splits sentences, but also collapses whitespace.
In my mind, it is predominantly the line splitting which makes the document "easier to diff".
What if there was a command line option to prevent joining of newlines?
E.g. mdslw --join never
or mdslw --no-join
.
This sounds like it would get messy really fast, especially when other extensions come along.
Agreed. The number of additional extensions that mdslw
can support in the long term is small, mostly due to time constraints.
The parser I am using in this case seems to be goldmark via the glow command line tool. https://github.com/yuin/goldmark/blob/master/extension/definition_list.go Definition Lists seem to be one of the extensions enabled by default there. [...]
This is an excellent question! As far as I can tell, the idea is that, similar as with bullet lists, indentation via spaces marks a paragraph. https://michelf.ca/projects/php-markdown/extra/#def-list
Thanks for those links, they were quite insightful. It seems like definition lists are a bit more free-form than I would have thought initially, meaning that my initial suggestion of parsing them will not work. I do have another idea that would make it very easy to support other extensions that I know, too, but I'm not sure when I'll find the time to test that idea.
While thinking about this some more, I noticed that my issue arose from the fact, that
mdslw
not only splits sentences, but also collapses whitespace. In my mind, it is predominantly the line splitting which makes the document "easier to diff".What if there was a command line option to prevent joining of newlines? E.g.
mdslw --join never
ormdslw --no-join
.
That feature is built-in at quite a deep level. I can have a look whether deactivating it is possible, but I'm not sure how hard it would be to implement. I'm also reasonably convinced that the resulting documents would look sub-optimal, in my personal view, which would make it an optional feature. For example this:
Lorem ipsum dolor sit amet. Consectetur adipiscing elit. Sed do eiusmod tempor
incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam.
would become this:
Lorem ipsum dolor sit amet.
Consectetur adipiscing elit.
Sed do eiusmod tempor
incididunt ut labore et dolore magna aliqua.
Ut enim ad minim veniam.
Is that what you mean? I'm not sure how it would help with definition lists. For example this:
Lorem ipsum
: Dolor sit amet. Consectetur.
: Adipiscing elit. Sed do eiusmod tempor
incididunt.
Ut labore et dolore
: magna aliqua. Ut enim ad minim veniam.
would become this:
Lorem ipsum
: Dolor sit amet.
Consectetur.
: Adipiscing elit.
Sed do eiusmod tempor
incididunt.
Ut labore et dolore
: magna aliqua.
Ut enim ad minim veniam.
I'm not sure whether that's still a valid definition list. What do you think? And thanks for the discussion! I'm always curious to learn about useful markdown extensions.
Hi @bugabinga. I've just released version 0.8.0
that contains the new retain-whitespace
feature that implements what I understood from our discussion above. With it enabled, mdslw
will no longer collapse any whitespace, including line breaks. What do you think?
Personally, I find this new feature a big improvement, because I can use it as an "escape hatch" of sorts. Thanks a lot!
In principle, I would much prefer a formatter in the style of gofmt
or rustfmt
(0 or minimal configuration options), but I do not think that is feasible with regard to all those extensions to markdown out there.
I'm also reasonably convinced that the resulting documents would look sub-optimal, in my personal view, which would make it an optional feature
I agree.
I view this feature as a compromise.
Even with this feature, your previous critique holds true.
Definitions, that require line breaks (e.g. because of line length or end markers), technically still get broken, even with the new feature enabled.
The "correct" formatting, I believe, would be:
With retain-whitespace
:
Lorem ipsum
: Dolor sit amet.
Consectetur.
: Adipiscing elit.
Sed do eiusmod tempor
incididunt.
Ut labore et dolore
: magna aliqua.
Ut enim ad minim veniam.
Without retain-whitespace
:
Lorem ipsum
: Dolor sit amet.
Consectetur.
: Adipiscing elit.
Sed do eiusmod tempor incididunt.
Ut labore et dolore
: magna aliqua.
Ut enim ad minim veniam.
Multi-line definitions would get space-indented, just like multi-line bullet list items do.
However, in my case, retain-whitespace
works well enough, that I can work around my use cases.
On a different note, the name retain-whitespace
seems too general.
From what I observe, only newline
ever gets retained?
Personally, I find this new feature a big improvement, because I can use it as an "escape hatch" of sorts.
Thanks a lot!
You are welcome! It's nice to know that this idea of mine (mdslw
) is useful for others, too.
In principle, I would much prefer a formatter in the style of
gofmt
orrustfmt
(0 or minimal configuration options), but I do not think that is feasible with regard to all those extensions to markdown out there.
I fully agree! The default values are currently chosen in such a way that the amount of configuration needed is minimal, or at least that's the intention. My hope is that mdslw
can be used by as many people as possible without having to configure anything.
I agree.
I view this feature as a compromise.
Even with this feature, your previous critique holds true.
Definitions, that require line breaks (e.g. because of line length or end markers), technically still get broken, even with the new feature enabled. […] Multi-line definitions would get space-indented, just like multi-line bullet list items do.
The way I understood the docs for definition lists was that indentation was optional (but definitely desirable).
On a different note, the name
retain-whitespace
seems too general.From what I observe, only
newline
ever gets retained?
Enabling the feature should cause all existing white space to be preserved, at least that was the intention. I haven’t tested it extensively, TBH, but enabling that feature circumvents the function that merges all consecutive white space within a paragraph.
Enabling the feature should cause all existing white space to be preserved, at least that was the intention.
Ah, that is curious. Upon further testing I noticed, that in general you are right. I just happened to stumble upon an exceptional case, where it still does collapse whitespace.
With enabled feature, the following paragraph
Lorem ipsum
: Dolor sit amet.
Consectetur.
: Adipiscing elit.
Sed do eiusmod tempor
incididunt.
Ut labore et dolore
: magna aliqua.
Ut enim ad minim veniam.
becomes
Lorem ipsum
: Dolor sit amet.
Consectetur.
: Adipiscing elit.
Sed do eiusmod tempor
incididunt.
Ut labore et dolore
: magna aliqua.
Ut enim ad minim veniam.
Still, just for the record; my personal preference would be to retain newlines, but still collapse other whitespace.
Currently, definition lists get mangled by
mdslw
. I realize, that there are a lot of markdown extensions out there, but I could not find an explicit statement about scope in the README.md (CommonMark?).Is this a bug or a feature :) ?