knrafto / language-bash

Parse and pretty-print Bash shell scripts
BSD 3-Clause "New" or "Revised" License
35 stars 9 forks source link

Brace expansion edge case #17

Open knrafto opened 7 years ago

knrafto commented 7 years ago

QuickCheck found this in this job

Tests
  Properties
    brace expansion:     FAIL (0.08s)
      *** Failed! Assertion failed (after 56 tests): 
      "\\}\\,,{y}{\\{\\ }\\}{}\\}s\\ p\\{\\},\\,lhhiq\\,qv}{}\\},}\\}\\ \\,{\\}\\ \\{a\\ \\,\\,z\\,t"
      Use --quickcheck-replay '55 TFGenR 0000000309618D3500000000000F4240000000000000E1390000000056F692C0 0 72057594037927935 56 0' to reproduce.
pbiggar commented 7 years ago

Found another one, in this job: https://travis-ci.org/pbiggar/language-bash/jobs/161995727

 brace expansion:     FAIL (0.10s)
      *** Failed! Assertion failed (after 69 tests): 
      "a,\\}}{a\\{,w\\}}{}l}{{xaky\\{\\ \\,\\,tuqmg}}z\\{,}asik{ng\\,"
      Use --quickcheck-replay '68 TFGenR 1BCE489E03A1DF8763BEBE7128281A981FBC02700490D72F1B17CB17703ECB89 0 31 5 0' to reproduce.
mmhat commented 4 years ago

From a failed test case I worked out a minimal example: "{a,b}{},c}" Bash expands this to a{},c} b{},c}. Our expansion is a} ac b} bc.

I suspect the following: Bash scans the expression and expands {a,b}. Since {} is not a valid brace expansion it's left untouched and since the remaining ,c} lacks an opening brace it's considered to be part of the postscript, too. Hence Bash expands {a,b} with postscript {},c}. We interpret {},c} as a brace expansion. Interestingly {},c} (without the {a,b}) yields the same in both systems.

Can anyone confirm that?

mmhat commented 4 years ago

After skimming the source code I think Bash works like I wrote in the last comment:

  1. It reads the preamble: The part until an opening brace with a matching closing brace.
  2. It reads until the matching closing brace.
  3. It looks at the text between the braces: If it contains a ',' expand it with "normal" brace expansion, if not try sequence expansion.
  4. If none succeeds treat the amble as normal text and proceed with the string following it (if there's any).

The interesting part of the source.

knrafto commented 4 years ago

That seems right. However bash expands a{},c} to a} ac (which we do as well) so there's more to it.

I think this is the rule we don't implement right: https://git.savannah.gnu.org/cgit/bash.git/tree/braces.c#n710 To get this to work in parsec, we may need to keep state to be able to look back at the last character.

knrafto commented 4 years ago

Actually maybe not, in {a,b}{},c} there's no whitespace that precedes the {}