grncdr / js-shell-parse

parse bash, with javascript (UNMAINTAINED)
MIT License
90 stars 13 forks source link

Don't have a good solution for parsing nested backticks #2

Closed grncdr closed 10 years ago

grncdr commented 10 years ago

You can nest backtick command substitution by escaping the inner backticks:

echo `echo \`echo ok\``

The amount of backslashes increases exponentially as you nest further, because you need to double up the backslashes at each level to compensate for the fact that they will be interpreted multiple times:

echo `echo \`echo \\\`echo \\\\\\\`echo ok\\\\\\\`\\\`\``

Currently I can't think of a way to express this in a PEG grammar without a bunch of duplicate rules for each level of nesting. Honestly, I'd like to punt on it and just say "sorry we won't parse that" but I'm not sure how long that would be an acceptable solution.

grncdr commented 10 years ago

on the other hand, this is just one of a few issues I have run into with using PEGjs, so it might be time to abandon that plan entirely...

geoff-nixon commented 10 years ago

I don't know the particulars of PEG grammars, nor am I nearly good enough at math to have any right to know this, but the pattern is Hibbard's Shellsort Sequence, i.e. 2ⁿ - 1, or 1 3 7 15 31 63 127 255 511 ...

Bizarrely, shellsort does not refer to a shell. Shell just happens to be the guy's name.

geoff-nixon commented 10 years ago

It's also possible to just convert them to the other form. This is a little script I wrote to do that a while back to do that, using only the shell itself, grep, sed, and tr—although it currently doesn't do backticks in single quotes properly.

#!/usr/bin/env sh # dash|mksh|zsh # tickoff — Geoff Nixon, 2013. Public domain.
                  # Replace backticks with POSIX $() style command substutions.

tickoff(){ o="$(cat | tr \\n ␤)"                          # Thank you, Unicode.
  while $(echo "$o" | grep -lq '\\\`')                       # Nested bacticks.
  do o="$(echo "$o" | sed -e 's/\\`/`/' -e 's/\`/\$\(/')"; done    # Unescaped.
  while $(echo "$o" | sed -e 's/\`/\$\(/' | grep -lq \`)    # Matched backicks.
  do o="$(echo "$o" | sed -e 's/\`/\$\(/' -e 's/\`/\)/' | tickoff)"; done #Yep.
  while $(echo "$o" | grep -lq '\$()') # Fix any extra $( we've now introduced.
  do o="$(echo "$o" | sed -e 's/\$()/\)\)/g')"; done
  while $(echo "$o" | grep -lq '\$(\")') # While trying to respecting "quotes".
  do o="$(echo "$o" | sed -e 's/\$(\"/\)\"/g')"; done 
     o="$(echo "$o" | tr ␤ \\n)"  # You couldn't do this with sed without this.
          echo "$o" | grep -lq \` && printf " Mismatch:
   Line $(echo "$o" | grep -n --color=always \`)" >&2 && exit 1
          echo "$o" ; }   # Any mismatched backticks leftover? No? Good. Print.

help(){ printf 'Usage: tickoff [script] | tickoff < script\n
        Processes a script, replacing `backticks` with $(substitutions).\n' ; }

case "$1" in
  help|-h*|--h*) help &&                                              exit 0 ;;
              *) [ -z "$@" ] && echo "$(tickoff)" || echo "$(tickoff <"$@")" ;;
esac
grncdr commented 10 years ago

This has been working for a while.