haghish / markdoc

A literate programming package for Stata which develops dynamic documents, slides, and help files in various formats
http://haghish.com/markdoc
86 stars 30 forks source link

Parsing/Handling Quotation marks #3

Closed wbuchanan closed 8 years ago

wbuchanan commented 8 years ago

The original file being converted is the help file for hextorgb. I used:

qui: log using hextorgb.smcl, replace
type hextorgb.sthlp, smcl
qui: log c _all

To render the helpfile as an SMCL document (and realize I could have just as easily altered the file extension to achieve the same effect). I've attached the SMCL file here as hextorgb.txt (couldn't attach the SMCL file directly).
hextorgb.txt

. markdoc hextorgb, e(md) nonumber
too few quotes
r(132);

The issue seems to come up with a while loop that is used to parse things:

01 - while substr(`"`macval(line)'"',1,8) == "      > " {
   = while substr(`"          //github.com/mbostock/d3/wiki/Ordinal-Scales#ordinal"':D3js 
> ordinal scale color"',1,8) == "      > " {
too few quotes

It's a bit verbose, but I've attached a log that I created trying to debug things here as well. I've not dug into the source much, but one thing that would likely be useful would be some refactoring to make components of the overall program more modular. For example, maybe there is a subroutine - or an entirely new .ado - that defines a token map (e.g., {p2col # # # #} to

... for html or \begin{table}[h]\begin{tabular}{pp}\hline & \hline ... for LaTeX). At one point I was trying to update StatWeave and get to know the source for it a bit better, but I would think some type of object oriented approach might be a bit more useful and would make it easier to squash bugs like this more efficiently. I already have Mata classes defined for all of the HTML5 elements that you could use - if interested.
markdocissue.txt

haghish commented 8 years ago

Please reinstall MarkDoc from github and rerun your code. This file returns no error for me. Probably you are using an older version of MarkDoc. I remember that error popping up before...

Mac 10.10.3 Stata 14.1

haghish commented 8 years ago

I am not sure what you mean by "some refactoring to make components of the overall program more modular. For example, maybe there is a subroutine - or an entirely new .ado - that defines a token map (e.g., {p2col # # # #} to"

wbuchanan commented 8 years ago

Sorry my browser at work is all sorts of jacked up at the moment. Essentially the program is doing a bunch of string substitution with a few predefined tokens to trigger specific renderings of the material. So in terms of refactoring, it would be easier to maintain and extend if the main program (e.g., markdoc.ado) called independent subroutines that would then perform a more specialized task. For example you might have a work flow then ends up looking something like:

markdoc     ->     subroutine 1
     |                  |
     |------->     subroutine 2
     |                  |
     |------->     subroutine 3
     |                  |
     |------->     program output

Subroutine 1 might be where the file is read initially, and subroutine 2 might identify any/all formatting tokens, and subroutine 3 would perform the translation of the tokens into the appropriate output format, and then the last subroutine would handle creating the output. This has a few different advantages for development/testing/community contributions. The first is that if I think I can make an improvement in how the input is being read/handled/stored, I could work on that independent of the rest of the project and assuming no bugs are present the next component would still be able to consume the output from it to do its task without issue. Then markdoc itself becomes more like a controller/dispatcher that calls different subroutines that can be maintained and developed separately.

haghish commented 8 years ago

I believe MarkDoc already was written with such an idea, but certainly can improve further. I reworked Weaver package to really isolate the tasks in separate programs. MarkDoc is also going in that direction but needs some rework to make it beautiful again.