Closed niebert closed 6 years ago
hey Engelbert! whoa, I've never seen pandoc. Thank you for sharing this.
yeah, this is a really neat idea. I'd thought about outputting the parsed data back to html or markdown, but I'll admit It is a little funny, when html output is the thing the wikimedia parser does well. You may wanna go that route, if you require cosmetic things like sortable tables and stuff, which this library ignores.
But yeah, I can definitely see a use for outputting cleaned-up markdown/html from the arrays of sentences and stuff. That would be a fun thing, and I'd be happy to do it.
how would something like this be?
wtf(myWikiText).toHtml({links:true, tables:true, formatting:true, infoboxes:true})
cheers
thank you for your reply. I would recommend to parse into a syntax tree, similar to the DOM tree in the browser (root node, nodes have children, e,g, subsections are e.g. childrens of sections,...). This abstract syntax tree (AST) could be generated by your wtf_wikipedia.js
. A tree visitor runs over the tree nodes of the AST and exports to a specific output format.
var ast = wtf(myWikiText).toAST({links:true, tables:true, formatting:true, infoboxes:true});
var ast2tex = new AST2Latex(); // AST visitor to create LaTeX
var latex_out = ast2tex.convert(ast);
It could be possible to init wtf with an visitor
var ast2tex = new AST2Latex(); // AST visitor to create LaTeX
wtf.initVisitor(ast2tex);
var latex_out = wtf(myWikiText).compile({links:true, tables:true, formatting:true, infoboxes:true});
var ast2reveal = new AST2Reveal(); // AST visitor to create RevealJS presentation
wtf.initVisitor(ast2reveal);
var reveal_out = wtf(myWikiText).compile({links:true, tables:true, formatting:true, infoboxes:true});
See PanDocElectron in Wikiversity. A simple Wikiversity article e.g. Math Lecture about Topology can be converted in a RevealJS presentation directly from the WIkiversity source. I created PanDocElectron as multiplatform Electron Application for Linux,Windows and MacOSX, but the installation is to complicated. wtf_wikipedia.js
will allow to perform a wiki source conversion directly in a browser without any installation. Browserify the whole project in NodeJS together with
cheers
Dear Spencer, thank you very much for the solution
wtf(myWikiText).toHtml({links:true, tables:true, formatting:true, infoboxes:true,math:true})
will be a great feature. Parsing the output HTML source into a DOM tree will be easy, because the browser does it, JQuery can be used, or even innerHTML. Good starting point and run the cross compilation to other formats.
The AST example in my comment above was not meant to be implemented by you. I just want to explain, how I want to use the HTML output for cross-compiling and a plugin-concept for output formats. I plan to use HandleBars compile functions to extend the methods of DOM nodes. Application of the compile method on DOM root node of HTML document creates the output format, by calling the complie functions for all children.
Thank you very much, for your sharing and developing __wtf_wikipedia.js__.
hey Engelbert, I've gotten markdown and html outputs working in the 2.6.1
version. This is how it works:
var wtf=require('wtf_wikipedia')
wtf.from_api('Aldous Huxley', 'en', function(wiki) {
var md = wtf.markdown(wiki);
console.log(md) //view the rendered markdown at https://stackedit.io/app
var html=wtf.html(wiki)
console.log(html) //regular old html output
});
i'm happy to do a proper AST. I've never done that before. lemme know if this works ok for you https://runkit.com/spencermountain/5a90bff3fb73ad0012f5f476 cheers
You are great, thank you so much.
Did a minor feature analysis for WebODF http://www.webodf.org/demos/
I am just starting to realize how powerful your library can be. A full webbased Office document generation.
cheers, Bert
The following code can create a DOM tree from generate HTML code (see also https://gojs.net/latest/samples/DOMTree.html ).
var dom_tree = document.createElement("body");
var html_code = "<b>hello</b> World!";
dom_tree.innerHTML = html_code;
thanks! yeah, just a heads-up - I'm gonna change the api a bit in the next version, so that if you want to get the html, and a list of categories, you won't have to parse the document twice. It'll be something like
var doc = wtf(wiki)
var html= doc.toHtml()
var cats= doc.categories()
//...and so on
cheers!
Great perfect. One thing I have'nt understood properly. How do you want developers to extend your library
wtf
so that it generates the doc
instance with the method toLatex()
Do you want pull request for additional output formats or is forking your recommended way forward ?
Cheers Berthey, yeah this is a great question, and you have good timing. This should definitely be part of the redesign - a way to influence the parser, and also a way to easily extend the functionality. hmm. Gimme a couple days to put the pieces into place, then I'll get your help with this. Things are currently pretty messy, but doing things like toLatex() should become substantially easier in a few days.
I'd be happy to include latex as an output format. Great idea
If you want me as collaborator, I willing to help. First I would support you in documentation in README.md later I would support you with additional support formats.
see https://niebert.github.io/Wiki2Reveal//wtf_wiki2html.html how I converted the markdown to HTML quick an dirty with https://github.com/niebert/Wiki2Reveal/blob/master/docs/js/wiki2html.js
hey, very cool. Yeah, of course, that would be good.
I'm re-organizing the library a great deal in the dev branch which you can check out. The basic idea is that doc = wtf(string)
will do basic parsing, but things only get fully parsed when they're needed, and we can continue operating on the doc as a class, with all sorts of helper-functions.
The branch is moving around a lot still, but you're welcomed to join in. Lets do some documentation once it's stable. You can start on a latex output, if you wanted, with our other outputs, in src/output.
You'll notice that instead of creating a gigantic json file, it's starting to create a document with methods like .sentences()
.sections()
and things - it's a lot cleaner, as the page gets bigger and weirder
feel-free to make a pr with some latex output. The next few days are busy for me, so I won't conflict anything. You can see some of the tests are already passing on the dev branch. It's in reasonable shape ;/
hey Engelbert, i'm just heading out tomorrow on a 2-week vacation (to japan!). I didn't get around to releasing the dev branch, but will do so when i return at the start of April. sorry about that cheers
Have a great time in Japan.
thanks @niebert !
Parsing of Wiki markdown generates a syntax tree. Is there a recommended way to create a output format other than plain text. Want to use document conversion via wtf_wikipedia.js with generated syntax tree e.g. to create LaTeX and other output formats, similar to PanDoc https://pandoc.org/try/ ? e.g. convert "===My Header===" into LaTeX syntax "\subsection{My Header}" . Thank you very much for developing wtf_wikipedia.js and sharing the code.