hornc / Fountain-expts

Fountain CSL grammar experiments
1 stars 1 forks source link

production terminals that output empty strings will not be reversible.... #2

Open hornc opened 11 months ago

hornc commented 11 months ago

In my current experiments, I'm using a lot of rules to change values with "" (empty string / ε) as the output...

This is not reversible, and probably breaks Fountain's ability to make sense parsing the generated output...

I haven't tried running these examples' outputs through the parsing direction, but it's just occurred to me that I might be relying heavily on a "feature" that's likely to disappear

Terminals probably shouldn't be blank... looking at the EBNF at https://github.com/catseye/Fountain/blob/29fb426732c4f83abb54f39f1a247d9f9e348ce5/doc/Definition-of-Fountain.md

It looks like that's what is specified : <<">> <<any except ">>+ <<">> (one or more character) but 0.4 is letting me get away with <<">> <<any except ">>* <<">> i.e. zero or more.

Probably can get around this but outputting something, but it'll be very noisy with the current constructs I have.

hornc commented 11 months ago

Trying to parse a 2 PRNG iteration output with prng.fountain (but modified so the main loop post-condition on L.15 is <. steps > 1 .>)

fountain.exe parse prng.fountain s39.txt seed=39
Killed

after some time.

( same result with an incorrect seed. It might have been faster to reject with a different seed. .... this is a really terrible example to test parsing 😆 )

s39.txt is:

Seed   : Coin toss
0xBDAE : Tails
0xF19B : Tails

I haven't looked at the interpreter code yet, but I'm not sure it's even theoretically possible to parse this with all the blank terminals in my current .fountain source.

Is this a bug in fountain.exe v0.4 to even accept productions with empty terminals?

e.g.

<. a = 0 .> ""

The EBNF spec looks correct in that it disallows empty terminals.

I got swept up just trying to produce a result using Fountain like code. Now I'm trying to think through what is means in terms of the theory...

hornc commented 11 months ago

What about productions like:

https://github.com/hornc/Fountain-expts/blob/bc09a0a98876fa8d9bf8a60d4c8478596c518691/quantum-lunch.fountain#L44-L48

That's not trying to work around anything using productions as arbitrary state-changing code, it's simply trying to implement a correct "also" if two tracked things are the same, otherwise there's nothing to say about them.

It still seems to have the same (potential) problem (blank terminals), but seems to be a reasonable and simple use case?

To hopefully zero in on what I'm pondering: The potential problem I see is that between every identifiable terminal token in the output, there are potentially infinite ε tokens. How is a parser supposed to cope with that (and re-construct a potential context) without getting caught in infinite loops?

(it's late) but DogTurnAround

https://github.com/catseye/Fountain/blob/29fb426732c4f83abb54f39f1a247d9f9e348ce5/eg/kennel-story-0.fountain#L28-L32

looks like it has empty string (ε) terminals in a original Fountain example, so maybe this is not fundamentally a problem like I thought ... or at least it's not just my idiosyncratic use of Fountain that reveals it.

hornc commented 11 months ago

The simple tests in #3 suggest there is no problem with empty terminals at all, either explicitly written as "" or simply implied.

I'll try harder to come up with a specific pathological example, or pinpoint specifically what is pathological about the PRNG grammar.

I'm beginning to suspect that maybe Fountain is efficiently parsing the PRNG output, but even efficiently that's going to take lifetimes, and that's what I should expect? ... and it's not particularly empty terminals that cause the problem, any other choice of terminal could lead to the same time taken.

hornc commented 11 months ago

FWIW

fountain.exe parse prng.fountain <(echo -en "Seed   : Coin tossX\n") seed=100

Gives a fast Failure, as expected.

  fountain.exe parse prng.fountain <(echo -en "Seed   : Coin toss\nXXX") seed=100

Sets it down the path of checking all the other rules, even though there is no production or combinations of terminals that could append XXX to that output.

Again, I'm not really sure what to expect (an eventual Failure, but on what timescale?). There could be an optimization to fast fail if no combination of terminals could ever match. I suppose it could generate empty strings, and another valid (edit: I don't think it is valid ... but how would you know without running it?) result would be:

Remaining: "XXX"

and that will take as much processing and time as a Failure result... so I've answered my own concern again. It's all good.

further edit: The parse process is Killed when all memory and swap is used up, which happens relatively promptly.

time fountain.exe parse prng.fountain <(echo -en "Seed   : Coin toss\nXXX") seed=100
Killed

real    0m55.718s
user    0m41.327s
sys 0m11.990s
cpressey commented 11 months ago

Not sure what all might be going on here, but just for the record: "" (if indeed Fountain allows it) should be a no-op. If it has some behaviour other than that, that'd be a bug.

I don't know if Fountain should allow it. Sometimes you do need to fill a slot somwhere with something that explicitly does nothing. I think I've used <. a += 0 .> (or something similar) for that in one grammar, but that does require you to know that a already has a value.

hornc commented 11 months ago

@cpressey I realise and agree now that the explicit empty string "" is a no-op and exactly equivalent to the implicit empty string.

Originally I thought the explicit "" was clearer, but that was just me learning and getting familiar with the syntax.

I then jumped down a hole involving reversible computing and not being able to reverse the operations which generated output string containing empty strings because you can't tell the difference between 0, 1, or many instances of ε.

After thinking about it a bit more, I realised we're not talking about reversible computing, and the grammar seems to work itself out so I thought there is no problem.

Now, I've just checked some references for whether there is any special place for ε in CSLs because it seemed likely that someone has written about it, and found some definitions which makes me suspect that these RHS empty strings move Fountain into type 0 / unrestricted grammar and TC territory, and that I was on to something by stumbling around them.

Introduction to Automata Theory, Languages and Computation, 1979 , Hopcroft and Ullman:

Defines CSLs in terms of unrestricted grammars:

Suppose we place the restriction on productions α → β of a phrase structure grammar that β be at least as long as α. Then we call the resulting grammar context-sensitive and its language a context-sensitive language

(9.3, p.223)

And although Wikipedia is often not the best source for this sort of thing: https://en.wikipedia.org/wiki/Context-sensitive_grammar

αAβ → αγβ ... The string γ is not allowed to be empty. Without this restriction, the resulting grammars become equal in power to unrestricted grammars.

Also suggests that the RHS of a production cannot be an empty string (unless it is start symbol -> ε).

I can't find a single ultimate definition of a CSL that I'm happy with, but they all seem to imply the same thing, which is that these empty strings on the RHS give the power needed to make this an unrestricted grammar.

Am I missing something? I started using explicit empty strings to "get things done", but I was writing Fountain like I'd write Thue (which is TC, as you know). I felt like I wasn't abusing the language after I noticed an implicit empty string in your example Kennel story:

https://github.com/catseye/Fountain/blob/29fb426732c4f83abb54f39f1a247d9f9e348ce5/eg/kennel-story-0.fountain#L28-L32

Now I'm worried that it is exactly these empty strings which give Fountain enough power to be Turing complete.

cpressey commented 6 months ago

Now I'm worried that it is exactly these empty strings which give Fountain enough power to be Turing complete.

Yeah, they do. I sometimes think of it like this: if you have empty-string terminals, you can recurse without consuming any input, so you can recurse unboundedly and do an unlimited amount of work. If you needed to consume input on each recursive step, that would limit the work you could do to some factor of the length of the input.

But... it's also inconvenient to have to consume some input every time you recursively apply a production, and there are other ways to limit the work you can do. What I would like to try is to implement the following and see if it's more convenient: Every time a unit of input is consumed, the grammar accumulates some constant number of units of "fuel"; every time the grammar needs to store an additional unit of context, it expends a unit of "fuel". This also limits the amount of work you can do to a factor of the length of the input, but does not tie it so directly to consuming input.

I don't know, maybe it doesn't actually make it much more convenient in practice. But at any rate, I haven't implemented it yet, and Fountain's approach is still "voluntary compliance" - it's supposed to be for expressing CSLs so if you're really interesting in writing something more powerful than a CSL you should really look for something else.

btw, I released Fountain version 0.5 recently, which doesn't address this of course, but it does implement random choice of alternatives during generation. Unfortunately I didn't think to check the sources in this repo at the time, but checking them now, the prng example seems to still work. Also wanted to let you know that I might just make another syntax change in the next version of Fountain (...because idea...) which would unfortunately break these again.