Exponential grow for case insensitive input.

ArthurSonzogni commented 3 years ago

Hi, Thanks for this tool. This is awesome!

I noticed some generator produce all the possibles combinaisons for case incentive input. It means this generate a list of item growing exponentially relative to the input length.

For instance if you use: (input: abnf)

test = "test"

Then you get your get. (output: "rrta")

add('test', Diagram(
  Choice(0,
    Terminal("test"),
    Terminal("Test"),
    Terminal("tEst"),
    Terminal("TEst"),
    Terminal("teSt"),
    Terminal("TeSt"),
    Terminal("tESt"),
    Terminal("TESt"),
    Terminal("tesT"),
    Terminal("TesT"),
    Terminal("tEsT"),
    Terminal("TEsT"),
    Terminal("teST"),
    Terminal("TeST"),
    Terminal("tEST"),
    Terminal("TEST"))));

I was curious to know if this is something you want to keep or fix later. If you want to fix it later, would you prefer I try to submit PR?

I tried all the generator online using this tool https://arthursonzogni.com/Diagon/#code_area

exponential output:

rrparcon
rrl
rrta

non exponential output:

rrutf8
rrtext
svg
html5
xhtml5
ebnfhtml5
ebnfxhtml5
blab
abnf
dot
rdot
rrdump
rrtdump

Can't say (reached unimplemented)

bnf
wsn
iso-ebnf
rbnf
sid

katef commented 3 years ago

This is pretty terrible, I agree. I was trying to think about better ways to handle this, when I wrote it. Do you have any suggestions please?

The problem is that those dialects don't support case insensitive terminals, so we have to expand out every spelling.

ArthurSonzogni commented 3 years ago

I was thinking: instead of outputting every combination, to rewrite it using a sequence of alternatives:

Input:

http

output (current)

| http
| Http
| hTtp
| HTtp
| htTp
| HtTp
| hTTp
| HTTp
| httP
| HttP
| hTtP
| HTtP
| htTP
| HtTP
| hTTP
| HTTP

output (proposed;)

 (h | H) (t | T) (t | T) (p | P)

Obviously, this is a very naive idea from me. I guess there are reasons this can't be done. I just found out this is implemented by permute_case. I will take a look.

katef commented 3 years ago

Yes, unfortunately that has a different meaning. For a grammar where there could be whitespace between tokens (depending on how lexing is defined), here you're introducing the possibility for space between each letter, and to produce multiple tokens for output. The original grammar there has only one token for the entire string.

katef / kgt

Exponential grow for case insensitive input. #37