Closed ingydotnet closed 10 years ago
There are some tradeoffs of this approach:
**don't write -dont-**
? How do we write <b>- and this is the key point -</b>
?so<del> frankly</del>,
=> so <del>frankly</del>,
=> so ,
which has an unwanted space before the comma)[ [foo]bar ]
= \[ [foo]bar \]
not [[foo]bar]
I suppose the motivation for the proposed behaviour is that we assume that unescaped */-<
etc. before/after a space when opening/closing (respectively) is likely to be intended to be literal in the vast majority of cases?
Is the approach a property of the parser or the grammar or the token? Could you declare some tokens like *
that are context-sensitive and some like [
that are not?
NB: I take it the following are also to be considered literal asterisks?
x* y* z
x *y *z
x* y *z
Hi @pdl. If you want to chat about this join #pegex or #kwim on irc.freenode.net. I'll try to give you my thoughts here.
I've done markups before where the phrase markup had to be /huggy/. That is, the opening marker had to have a space before and non-space after, and the close marker had to have the opposite (with exceptions for punctuation). It worked ok for a general case in English, but I think it was too strict. Wouldn't work for Chinese etc.
I decided that there shouldn't be any such rules. A pair of asterisks (inside a
block context) should bold. But then I read somewhere that markdown that a
markdown opening marker could not be followed by a space. It makes sense becaue
what is a bold space? By extension, **
should not be a bold nothing. That's
what this ticket is about.
It is the intenion that Kwim never fails. It always makes something out of anything. It might issue warnings on the command line or mark thing in red in html etc. I'm not worried about people having to try a couple times to get what they want. That's how people learn. I do want to make sure there is a way to express anything (well maybe not to an insane level).
Also Kwim is brand new, so there are a few basics not yet implemented, like escaping. A backslash will make any char be non-markup. There is also a /functional/ way to write all markup phrases:
<bold this text is bold>
*this text is bold*
So I hope we'll have enough rope.
I guess it all comes down to test cases and, figuring out what makes the most sense. You bring up a lot of interesting points, but I think we need to turn them into tests, and see what plays out best. I'm not afraid to implement it a few times. The code base is simple.
Have you seen the language grammar? https://github.com/ingydotnet/kwim-pgx/blob/master/kwim.pgx
Well I look forward to conversing with you more on this.
I gave this a go in aeb465a19c9fb2782d59c09c157acfd0f1af3b51
@pdl, play around. Here is a good way to test:
echo 'a = b * c * d' | kwim --to=byte
@pdl, thanks to your tests I decided to scrap -x- for --x--.
I'd rather not have to do that for:
chocolate/strawberry/vanilla
THe workarounds are:
chocolate / strawberry / vanilla
chocolate/strawberry/vanilla
Thanks for the help. Closing this for now. Let's add more specific issues for remaining problems.
For bold, these are ok:
These are not
ie a starting phrase is markup followed by non-space that is also not same markup.
Others include: