Phrase markup should not allow space on the working side

ingydotnet commented 10 years ago

For bold, these are ok:

x*y*z
x *y* z

These are not

x * y* z
x *y * z
x * y * z
x ** z

ie a starting phrase is markup followed by non-space that is also not same markup.

Others include:

[foo] `foo` <foo> /foo/ *foo* -foo-

pdl commented 10 years ago

There are some tradeoffs of this approach:

probably harder to code
What about **don't write -dont-**? How do we write <b>- and this is the key point -</b>?
It's not necessarily obvious to human authors and readers that (if I've interpreted you correctly) open-tags cannot be followed by space, close tags cannot be preceded by space and open+close is not permitted.
Anything converting to kwim must strip internal spaces and remember to place them outside the tags (and perhaps remove double spaces?)
Cannot represent documents with genuine internal spaces, e.g. ins+del to indicate corrections (so<del> frankly</del>, => so <del>frankly</del>, => so , which has an unwanted space before the comma)
Cannot use spaces for readability [ [foo]bar ] = \[ [foo]bar \] not [[foo]bar]
Can tags with no content ever be represented, e.g. input?

I suppose the motivation for the proposed behaviour is that we assume that unescaped */-< etc. before/after a space when opening/closing (respectively) is likely to be intended to be literal in the vast majority of cases?

Is the approach a property of the parser or the grammar or the token? Could you declare some tokens like * that are context-sensitive and some like [ that are not?

NB: I take it the following are also to be considered literal asterisks?

x* y* z
x *y *z
x* y *z

ingydotnet commented 10 years ago

Hi @pdl. If you want to chat about this join #pegex or #kwim on irc.freenode.net. I'll try to give you my thoughts here.

I've done markups before where the phrase markup had to be /huggy/. That is, the opening marker had to have a space before and non-space after, and the close marker had to have the opposite (with exceptions for punctuation). It worked ok for a general case in English, but I think it was too strict. Wouldn't work for Chinese etc.

I decided that there shouldn't be any such rules. A pair of asterisks (inside a block context) should bold. But then I read somewhere that markdown that a markdown opening marker could not be followed by a space. It makes sense becaue what is a bold space? By extension, ** should not be a bold nothing. That's what this ticket is about.

It is the intenion that Kwim never fails. It always makes something out of anything. It might issue warnings on the command line or mark thing in red in html etc. I'm not worried about people having to try a couple times to get what they want. That's how people learn. I do want to make sure there is a way to express anything (well maybe not to an insane level).

Also Kwim is brand new, so there are a few basics not yet implemented, like escaping. A backslash will make any char be non-markup. There is also a /functional/ way to write all markup phrases:

<bold this text is bold>
*this text is bold*

So I hope we'll have enough rope.

I guess it all comes down to test cases and, figuring out what makes the most sense. You bring up a lot of interesting points, but I think we need to turn them into tests, and see what plays out best. I'm not afraid to implement it a few times. The code base is simple.

Have you seen the language grammar? https://github.com/ingydotnet/kwim-pgx/blob/master/kwim.pgx

Well I look forward to conversing with you more on this.

ingydotnet commented 10 years ago

I gave this a go in aeb465a19c9fb2782d59c09c157acfd0f1af3b51

@pdl, play around. Here is a good way to test:

echo 'a = b * c * d' | kwim --to=byte

ingydotnet commented 10 years ago

@pdl, thanks to your tests I decided to scrap -x- for --x--.

I'd rather not have to do that for:

chocolate/strawberry/vanilla

THe workarounds are:

chocolate / strawberry / vanilla
chocolate/strawberry/vanilla

Thanks for the help. Closing this for now. Let's add more specific issues for remaining problems.

ingydotnet / swim-pm

Phrase markup should not allow space on the working side #2