mrbaseman / parse_yaml

a simple yaml parser implemented in bash
GNU General Public License v3.0
165 stars 37 forks source link

possible bug in anchor parsing? #6

Open danix800 opened 4 years ago

danix800 commented 4 years ago

Test example:

---
MainSourceFile:  '/test/4.9.c'
Diagnostics:
  - DiagnosticName:  bf-4.9
    DiagnosticMessage:
      Message:         '''\0'' is assigned to a pointer. Probably meant: *ptr = ''\0''.'
      FilePath:        '/test/4.9.c'
      FileOffset:      429
      Replacements:
        - FilePath:        '/test/4.9.c'
          Offset:          421
          Length:          7
          ReplacementText: '*(tmp.ptr)'
  - DiagnosticName:  bf-4.9
    DiagnosticMessage:
      Message:         '''\0'' is assigned to a pointer. Probably meant: *ptr = ''\0''.'
      FilePath:        '/test/4.9.c'
      FileOffset:      499
      Replacements:
        - FilePath:        '/test/4.9.c'
          Offset:          488
          Length:          10
          ReplacementText: '*(param->ptr)'
...

The last sed would strip off quotes before passing into awk for anchor parsing, so ReplacementText's value becomes *(param->ptr) (no quote). I think the unquoting should be done in the anchor parsing stage, right?

mrbaseman commented 4 years ago

yes. this makes the processing in the anchor parsing stage even more complicated. Also I have seen that the multiple single quotes aren't processed properly 'something ''inside'' a string' becomes something ''inside'' a string instead of something 'inside' a string

mrbaseman commented 4 years ago

I admit that the unquoting still needs improvement. With 780ab94 I have committed a partial fix: At least it doesn't run into an endless loop anymore if the string starts with a '*' and the following characters are not an anchor. If the string is enclosed in quotes and starts with a '*' and an anchor with the same name as the rest of the string is found, it is still treated as a reference to that anchor. I might improve this later on.