slab / quill

Quill is a modern WYSIWYG editor built for compatibility and extensibility
https://quilljs.com
BSD 3-Clause "New" or "Revised" License
43.05k stars 3.35k forks source link

Multiple output engines (including Markdown) #74

Closed ollym closed 8 years ago

ollym commented 10 years ago

It would be amazing if the output generated could be in formats other than just HTML. Like Markdown and others.

leeoniya commented 10 years ago

should be very easy to add bidirectional HTML <> markdown support similar to the demo i did for redactor-js here: http://leeoniya.github.io/redactor-js/

using https://github.com/leeoniya/reMarked.js and https://github.com/chjj/marked

jbenet commented 10 years ago

Recommend to use pandoc + pandoc IR.

jhchen commented 10 years ago

I think this would be a good use case for an optional module. If anyone is interested in building this I'm happy to help from the Quill side. Sounds like @leeoniya and @jbenet might already have some implementation ideas but quill.getContents could also be helpful to avoid parsing HTML. It will return an object like:

{
  startLength: 0,
  endLength: 13,
  ops: [
    { text: 'Hello' },
    { text: 'Quill', { bold: true } }
  ]
}
rhythmus commented 10 years ago

+1

chase commented 10 years ago

:+1:

ollym commented 10 years ago

@jhchen I've done a bit of research into this, and there are some quite serious incompatibilities that need to be addressed if this will ever be possible.

Firstly, there are some things markdown just doesn't support, like font colours, background colours, different fonts (and styles) etc.. It should be up to the implementor to remove these from the toolbar.

Secondly, (and most importantly) quill doesn't support headers (<h#>), instead it "imitates" headers by changing the font-size and allowing the user to add 2 line breaks. This (i guess) is to allow you to have larger font-sizes inline (which isn't supported by markdown either).

In fact I don't really understand why you did it this way round, and chose not to support headers natively?

jbenet commented 10 years ago

@ollym

Firstly, there are some things markdown just doesn't support, like font colours, background colours, different fonts (and styles) etc.. It should be up to the implementor to remove these from the toolbar.

The proper markdown spec allows html directly in the source document. Editing markdown should work with src and wysiyg tabs, so the html would only show up in the src. As most markdown parsers do, html in markdown should be an option that devs can toggle off.

Secondly, (and most importantly) quill doesn't support headers (), instead it "imitates" headers by changing the font-size and allowing the user to add 2 line breaks. This (i guess) is to allow you to have larger font-sizes inline (which isn't supported by markdown either).

Agree, headers should be proper <h#>. You can apply font sizes to headers just like any other html element (style attr, css class, etc).

ollym commented 10 years ago

@jbenet embedding HTML should not be considered as an option.

leeoniya commented 10 years ago

in the interest of avoiding another rehashing of prior markdown/wysiwyg arguments and implementation chatter, you guys may find the following discussion useful: https://github.com/dybskiy/redactor-js/issues/2

jhchen commented 10 years ago

Yes headers are not currently supported while font sizes are. There's not a great reason why one is implemented and not the other but I think they can mutually coexist.

There are a few other formats in Markdown currently not in Quill. I think the list is headers, blockquotes, lists/bullets, code blocks / inline code, and horizontal rules.

As brought up in the discussion @leeoniya linked, the use case is important and will dictate what needs to be built.

rhythmus commented 10 years ago

@leeoniya thanks for that fine and enthusing read! I pick up from there, but prefer to do it here since WYSIWYG Markdown support in Redactor.js proved vain desire, while Quill.js still has the benefit of being the new kid on the block.

The goal (at least to me) should not be to add .md export on top of a WYSIWYG “rich text” html editor (with bidirectional, lossless support for ad hoc inline styles, attributes, and all sorts of html cruft), but, as @leeoniya, @tehnorm, @lolmaus, @tnypxl and @paulcarroll suggested (in the thread referenced above), to forget about html altogether, and create a true dedicated WYSIWYG Markdown editor. Markdown should not be an afterthought: it must be the primary input syntax and persistent file format, whereas html should only be one out of many possible output formats (one could have Pandoc running in the backend), albeit a very important one. If we would go for that, we could easily drop support for one-on-one bidirectional html↔md syncing, and perchance have peace of mind.

With WYSIWYG I do not mean “what you see is what you get” in the sense of: the preview will exactly match the printed output — in the days of ubiquitous responsive design for tantamount display sizes, that’s become an obsolete obsession. Wysiwig, instead, is a proxy for intended style, and really means that structure (or “semantics”, if you will) is separated from presentation, not by renouncing presentation at all, but to make it flexible and replaceable.

A wysiwyg Markdown editor would allow the user to select and swap stylesheets (pure external css, that is) at will, and still be able to directly edit the source text in the theme of his choosing. One such theme could be the usual single-font, single-font-size, syntax-colored thing we got used to in “visual” Markdown editors (and every other code editor for that matter), showing the actual Markdown syntax markers nicely color-coded. (But that experience needn’t necessarily rely on a full-blown parsing expression grammar implementation — cfr infra.) More finished-output alike, typographically pleasing stylesheets for editable text is what we should be really after, though.

Regardless of the selected theme/stylesheet, users should have the option to toggle the visibility of the actual syntax on and off, not unlike “non printing characters” can be shown or hidden in most word processors and text editors — because that’s what the syntax markers in fact are.

Exemplifying such interfaces, to me, are the iA Writer app (native OSX, iOS), and the late Editorially web based markdown editor: the markup’s syntax markers are dimmed away, paragraph (block-level) marks (#, -, 1., etc.) hung into the margins. However, unlike these examples, the real challenge is to not only do some nice styling with the markup itself (as in common syntax highlighting, and as the manifold implementations do that plug on CodeMirror), but to have styles applied, live, as the user types, onto the marked-up text.

Applying style would be done both by typing valid markup, or by selecting a string and clicking a button. If “show markup” is toggled off, markup typed by the user would instantly disappear as soon as there’s a match (and the styling is applied), and a button bar would be visible on top of the interface by default. If “show markup” is toggled on, markup would be always visible (albeit dimmed into the background, and/or hung into the margin), and the button bar would only show when a string gets selected, above (or, on mobile, below) the text selection.

As mentioned by others, dealing with character escapes is one of the implementation hazzles, but not insurmountable. With markup toggled on, reserved characters would be always interpreted as markup, and the user would have to do the escaping explicitly (\*escaped\*). In pure WYSIWYG mode (markup toggled off), on the other hand, reserved characters would be escaped by default. That, or we could still go ahead with live tokenizing, and use the backspace as a trigger to undo the interpretation and yet apply the backslash escapes, very much like our trusted word processors implement similar features like hotstrings, text expansion, and auto-replacement.

Another issue is syntax validation. Sure enough, Markdown always is valid. But we do not want multiple clicking of the UI buttons yields “invalid” or at least meaningless nested syntax (like: ![![](http://)](http://)). Instead, we want the syntax inserted to be well-formed. I’d say we need sanitization, rather than validation. That too, could be done, preferably live as well.

From a UX/UI viewpoint, I guess combined display of markup and styled elements (like embedded images, tables and math) will be the hardest part. But I will be happily contributing to help tackling those.

There are very few projects striving for a true WYSIWYG Markdown editor, in the sense described above. Yet, as this, and the other thread (June 2012), mentioned above, and a related discussion at StackOverflow (Oct. 2012), prove, there clearly is demand. (Personally, I want to see this happen, badly!)

The nearest things to a working prototype for what I have in mind, are

The first plugs on CodeMirror, while the latter two keep a lean DOM under the hood.

@paulcarroll Regarding the pains you went through in keeping a rigid DOM structure to control the conversion to and fro Markdown and “how bad the various browser implementations are surrounding text manipulation”: Did you consider Shadow DOM? It’s an obvious use case… Admittedly, browser support is poor, but there’s polyfills. With an obfuscated internal subtree handling the lexed output of the Markdown parser, we may keep the style hooks out of the main document DOM tree.

As a post scriptum: Maybe we shouldn’t even stop at Markdown. After all, the present thread is titled “Multiple output engines (including Markdown)”. Sure enough, we want to do things right, from the start, and envision clean and versatile code abstraction.

Then we could foresee that — while the Markdown ecosystem is increasingly suffering from fragmentation with many divergent implementations — we might want to support arbitrary lightweight markup syntaxes in the future, too. One could name a few veterans (AsciiDoc, reST, MediaWiki), but, to make my point, I refer to some brand new takes on light-syntax: Fountain, Skriv. We may reckon with use cases that indeed require multiple output formats, like having the ability to go from Wiki to Markdown, from Textile to reST, and back to Markdown, and eventually to html.

That would imply any implementation must do some thorough code abstraction, would not rely on hard-coded regexes, would not bind the buttons of the UI to hard-coded find-and-wrap-within-delimiters operations. We’d need the parser to be fed with a symbol table, a dictionary mapping the tokens of the internal parse tree with that of the intended output format, amongst which, in particular, the necessary html tags, classes and other css hooks needed for the WYSIWYG interface, which may be very different from the html output format. In short, I do not think an intermediate representation of the parse tree (jsonml, jison, whatever) would be “ridiculous overkill” (@leeoniya): it would be key to realize a versatile, arbitrary format, output mechanism. (In passing, have a look at Substance.io’s Document Model in json, and their web-based text editor core.)

will-hart commented 10 years ago

@rhythmus thanks for mentioning demarcate. I can certainly see a case for a decent light weight WYSIWYI (What You See Is What You Intend?) editor that can output multiple formats. This was the original vision for demarcate as at the time I was running a few websites using different static site generators and the side by side editing style most Markdown editors have didn't do it for me. Definitely +1 to this issue!

I had been working on a "2.0 branch" for demarcate which removed jQuery dependencies, allowed modular parsers (i.e. plugable reST or Markdown output builders), used contenteditable like pen and Hallo above etc. etc. I haven't really pursued development in a while as all of a sudden a bunch of more polished editors cropped up (like Quill!) with teams that seemed to have the know-how and enthusiasm to do a better job :)

My solution to the output formats wasn't particularly sophisticated. I just ran over the DOM recursively and used a dictionary to map HTML tags to the right output syntax. Doubtless somebody with some more ninja coding skills could do a much better job using a few simple regexes!

ollym commented 10 years ago

This topic has got way to long already. Back on topic...

I think Quill has enough of a base framework to make this achievable within the project without the threat of starting again.

All that needs to happen is:

  1. Add headers, blockquotes, lists/bullets, code blocks / inline code, and horizontal rules into the core.
  2. Add an easy way to configure the toolbar module to only show those controls.
  3. Create a Markdown module that provides a toMarkdown method to export the contents to a markdown format.

I haven't done much looking, but I suspect the next problem will come from how deltas work. Ideally you'd use getContents rather than parsing the HTML directly. But I believe the deltas only reference text and have no context (as in, whether that text is in a blockquote, or header etc.). @jhchen am I right?

lazywei commented 10 years ago

+1 for markdown. +10 for MathJax!

jhchen commented 10 years ago

It's already possible to control what controls are included in the toolbar. The implementer actually passes in a container and the toolbar module just looks for certain classes to attach events to. Toolbar Module Docs

A subtlety in Quill's Delta representation is that there's always a newline character (except the empty document) and context (or what we refer to as line level formats internally) is represented on this character. For example (if we had headers implemented):

[
  { value: "Text" },
  { value: "\n", attributes: { header: 2 } }
]

would be:

Text

cben commented 10 years ago

FWIW, http://mathdown.net [https://github.com/cben/mathdown] is another example of configuring CodeMirror to "format" markdown as-you-edit via syntax highlight. Plus in-place MathJax rendering via CodeMirror's support for embedded widgets [https://github.com/cben/CodeMirror-MathJax]. Works but not very robust yet. A similar technique could be used to render images. CodeMirror works well with variable fonts; the only big limitation I know is mediocre support for iOS/Android.

Thanks people for the all links, some very interesting.

markbao commented 10 years ago

+1. One thing to note is that almost all of the HTML → Markdown formatting scripts work badly with WYSIWYG engines. The Redactor example above, for example, starts inserting <div>s and <span class="line-height: 1.4em"> when you add headers and lists, and generally do not produce true-to-content (and also ugly) markup.

As a result, the best way to go about this, I think, is to find a way that doesn't go from HTML to Markdown as an afterthought, but builds Markdown alongside HTML.

lolmaus commented 10 years ago

@markbao, but ain't it technically possible to do pure and semantic HTML with a WYSIWYG editor?

leeoniya commented 10 years ago

@markbao

One thing to note is that almost all of the HTML → Markdown formatting scripts work badly with WYSIWYG engines

the main issue is that WYSIWYG editors don't worry too much about producing clean html that's restricted to the markdown subset. they only worry about getting the right look once rendered in the browser. an extra <br> here, an empty <p> there and span /w css vs <strong> doesnt matter. in many cases the end justifies the means.

The Redactor example above, for example, starts inserting <div>s and <span class="line-height: 1.4em"> when you add headers and lists, and generally do not produce true-to-content (and also ugly) markup.

if you're talking about http://leeoniya.github.io/redactor-js/, i'm not seeing this behavior. can you provide steps to reproduce, please?

As a result, the best way to go about this, I think, is to find a way that doesn't go from HTML to Markdown as an afterthought, but builds Markdown alongside HTML.

indeed, it is. but you still have to edit in either a markdown or WYSIWYG html editor. so that editor (whichever it is) needs to maintain an intermediate representation and STILL restrict the WYSIWYG portion to the markdown subset.

markbao commented 10 years ago

@lolmaus Theoretically, yes. In practice, the commercial version of Redactor (not the open-source one above) is the only one I've seen that has been able to do this, so it's possible but difficult. Neither Summernote nor Quill have been able to get this right on the money—but if Quill did, I'd be ecstatic.

So to revise what I said: given that Quill will not be totally perfectly semantic, simultaneous Markdown is probably best. If Quill can be totally perfectly semantic, then HTML → Markdown might be possible! (But obstacles abound.)


@leeoniya Yep for sure. And to reproduce:

  1. Go to last paragraph and hit Enter
  2. Enter "Test" in the new paragraph
  3. Select "Test" and format it as Header 2
  4. Hit Enter (notice <div></div> is created in the Markdown output)
  5. Type some text
  6. Hit the Bullet List option; notice the Markdown output looks like this:
<div>
    * <span style="line-height: 1.45em;">lost the game</span>  
</div>

You can then type anything into the Markdown output to engage Markdown → HTML conversion, and see that the bullet is lost due to misformatted syntax, which is the broken result you would see if you convert the Markdown back to HTML.

Chrome 34 OS X.

leeoniya commented 10 years ago

ah ok, i'm seeing this in Chrome, but not FF. i could tell you to just tweak reMarked's options to swallow the div and span tags, but that doesnt do anything to fix the fact that redactor puts that stuff into the html in the first place.

if quill already maintains an intermediate representation internally, then it would be easy to provide a renderer of that AST in markdown alongside the html. but the amount of code would be pretty close to what reMarked already does since there isnt an html parser built into it other than what the DOM already provides.

for unidirectional markdown output or bidirectional editing, the first step in every case is to have a WYSIWYG html editor that produces clean, markdown-restricted output either as html or as some intermediate AST. poor html output comes from a poor internal AST, so the two are one and the same.

cben commented 10 years ago

Another entrant: https://stackedit-beta.herokuapp.com/ (source "nope, not released yet" but I presume will be) is now based on contenteditable and does a good job styling markdown in-place.

venil7 commented 10 years ago

+1 for Markdown input/output module

jhchen commented 10 years ago

I'm going to split this into a few issues:

  1. Supporting headers: https://github.com/quilljs/quill/issues/111
  2. Supporting lists/bullets: https://github.com/quilljs/quill/issues/82
  3. Adding text context listener (necessary for the richer markdown editor @rhythmus and others have suggested) https://github.com/quilljs/quill/issues/112

This current issue, for tracking purposes, will be resolved once a toMarkdown function that outputs the current contents of Quill in markdown is built, as this is what I interpret the OP to be requesting.

ollym commented 10 years ago

@jhchen blockquote, code, tables, indents ?

venil7 commented 10 years ago

@jhchen any approximate ETA on toMarkdown()?

jhchen commented 10 years ago

@ollym here you go: https://github.com/quilljs/quill/issues/115, https://github.com/quilljs/quill/issues/116, https://github.com/quilljs/quill/issues/117, https://github.com/quilljs/quill/issues/118

@venil7 I don't have an ETA at the moment. I'll try to use Github Milestones soon for better transparency but it's a self-contained task so it'd be suitable for someone from the community to work on it too.

rhythmus commented 10 years ago

@markbao @lolmaus @leeoniya @jhchen Last week’s post on the Medium Engineering blog is recommended reading. Nick Santos (@nicks) discusses the difficulties with contentEditable, and how his team overcame them while building the Medium WYSIWYG html editor, which (although without Markdown support) is, as regards UX, the finest thing out there indeed.

Key takeaways:

Santos mentions the Chromium team is thinking along the same lines, but with plans for an even better (because cleaner, and built-in) implementation using Polymer Elements and the shadow DOM.

@will-hart I like the WYSIWYI (What You See Is What You Intend) acronym. What about “wwsiwww” (what we see is what we want)? — Reminds me of the wwWwwWwwwwWw from the 1830 Belgian Revolution. Or the www, of course. — I take it you don’t plan any further development on demarcate.js?

@cben Great work you did on Mathdown! I added the project to my repository of Markdown editors. Somewhat off-topic here (since you build on CodeMirror, which generates very ugly DOM, which is far from what we want in a true md↔html wysiwyg implementation), but the visible results (i.e. interface/UX, styling markdown in-place) are heading — imho — towards a prototype proof-of-concept Markdown wysiwyg editor. I forked your code base, and did a make-over of the interface, just to get my hands dirty on CodeMirror, and to see how far we can get towards our goal, yet. I, for one, like editing Einstein in a web browser… If you like it too, and would want me to merge my contributions back into your repo, do let me know how you would want me to file a pull request.

ourmaninamsterdam commented 9 years ago

We are also in search of the holy bidirectional html <> Markdown converter and have yet been able to find a complete solution. In our project we are storing all of our content as Markdown and the user will be able to edit the Markdown directly or use a WYSIWYG editor to edit the HTML. This is then converted back to Markdown and saved to the DB.

What we did was:

  1. Find an HTML Editor that produces squeaky clean, terse HTML. The one that came out on top was the very impressive Scribe from the Guardian. Finding this was the hardest part.
  2. Find a good Markdown > HTML converter: marked
  3. Find a good HTML > Markdown converter: toMarkdown

We then plugged it all together. We now have some issues where we want to extend Markdown with our own syntax, but that's a different story.

lolmaus commented 9 years ago

@ourmaninamsterdam, thank you for your report! Very helpful.

Did you check out http://leeoniya.github.io/reMarked.js/ ?

akshatpradhan commented 9 years ago

I skimmed the comments but I wasn't sure about the consensus. Will Quill.js's toolbar support Markdown syntax?

jhchen commented 9 years ago

This functionality should be a module but I'm not sure there is consensus on how it should be implemented. Also there is not a 1:1 mapping between the formats Quill and Markdown supports so that will have to be addressed somehow. Some of it will be resolved as Quill supports more formats and at some point it might become a superset.

akshatpradhan commented 9 years ago

@jhchen Would it be possible to just map to GH markdown for now, and later we can figure out how to support other markdown flavors based on developer feedback? We can take an iterative approach.

I would really like support for GH Markdown.

vinayraghu commented 9 years ago

:+1:

ecmyers commented 9 years ago

:+1:

rhythmus commented 9 years ago

Very inspiring wysiwyg Markdown editing project @ http://abnerlee.github.io/typora/2015/03/11/why-typora/

dlo commented 9 years ago

I wanted to mention that this is something that I'd really love to contribute towards if there's any suggested approaches. We're using Quill on a client project and Markdown output would be preferable. Thanks again @jhchen for the great editor!

jhchen commented 9 years ago

Quill outputs a Delta via getContents() that provides a structured and consistent data structure that would be a good starting point in converting to markdown.

ollym commented 8 years ago

@jhchen we're looking at starting building a markdown editor as described. Did I read somewhere that you're working on v2? We're currently looking at either trix or quill as base frameworks for this job. Do you have any thoughts about how best to approach this and which direction to take?

jhchen commented 8 years ago

Quill is on its way to 1.0 which will support headers and nested lists. I'm not intimately aware of what you are trying to build other than the formatting features that exist in markdown so the suggestion would be to look at the formats that each editor supports. Many other subjective elements are in play but I'm obviously biased.

jhchen commented 8 years ago

The 1.0 beta has been out for a while with headers and nested lists added as well. Quill supports formats that Markdown does not and there are multiple syntaxes for Markdown formats. How to deal with these ambiguities is likely preferential depending on the user and I do not believe it should be built into Quill core. With getContents() one can trivially output to Markdown with their own preferences.

chrisshroba commented 7 years ago

@jhchen Any suggestions for "trivially outputting to markdown"? It seems there are an abundance of Markdown -> HTML converters and Markdown -> AST converters out there, and a couple HTML -> Markdown converters, but I can't seem to find any AST -> markdown converters, and generating markdown manually from Deltas seems like it would be a lot more error prone than generating an AST which could be fed into a tried-and-tested converter to get markdown.

I know I could just grab the innerHTML of the editor and feed that to an HTML -> Markdown converter, but it seems like the Deltas (not the HTML) should be used as the source of truth.

Coffee2CodeNL commented 7 years ago

There's a Markdown spec now: https://commonmark.org/

mkurz commented 7 years ago

Also see "A formal spec for GitHub Flavored Markdown" from the GitHub blog

Coffee2CodeNL commented 7 years ago

Damn perfect, I really hope for more GFM support in Quill, existing editors make use of CodeMirror, which has major trouble with IME on mobile 😐

Basically, you need to press space twice and whatnot on default keyboards 😕

rhythmus commented 7 years ago

@chrisshroba @jhchen remark and mdast (both by the same author) is what you might be looking for:

https://github.com/wooorm/remark https://github.com/syntax-tree/mdast

It’s GFM and CommonMark compliant already, and features a plugin architecture to add any desired syntax extensions.

treefitty commented 7 years ago

@rhythmus I certainly would've considered something like Shadow DOM indeed if it were available waaaay back when I did this, but as I no longer work for the company I wrote the editor for it's not high on my priority list. Having said that I'm not sure if Shadow DOM would alleviate the difficulties I had with the selection API (that was the particular pain point for me as each browser - and version - had different ideas about how to handle a keypress that inserted content, deleted content, the backspace key etc... it was awful.

I did use rangy at the time which served a purpose.... but was a difficult to debug collection of hacks and workarounds (still, a most commendable effort from the author and all due respect given!).

Thanks for the detailed and great comment though, it really ties together all the issues showing how MD output really isn't possible without tackling the problem holistically

duhaime commented 2 years ago

@jhchen you mentioned above "With getContents() one can trivially output to Markdown with their own preferences."

Is there any chance you could show us what you mean?