mvnmgrx / kiutils

Simple and SCM-friendly KiCad file parser based on Python dataclasses for KiCad 6.0 and up.
GNU General Public License v3.0
78 stars 25 forks source link

Fix issues identified when parsing several hundred kicad 6 and 7 schematics #73

Closed sAlexander closed 1 year ago

sAlexander commented 1 year ago

In parsing several hundred schematics, I identified the following issues. I tried to keep my changes as clean and simple as possible, and I welcome any suggestions for improvement. For all of the changes, I included an example file that previously failed to parse and now successfully parses.

Also, thank you for sharing the kiutils library! It is a very well written and impactful library. I'm planning on using it more for my projects.

sAlexander commented 1 year ago

Full disclosure: I'm least happy with the Fix issue with escaped backslash before string termination change. While I fixed the case where "\\" parsed incorrectly, I suspect some other things will still parse incorrectly (e.g. "\\\"). I encourage you to give that change extra scrutiny! :)

mvnmgrx commented 1 year ago

Hello @sAlexander

Thanks for taking the time to test kiutils so thoroughly. I've marked some minor changes in my review. Happy to merge your PR afterwards!

mvnmgrx commented 1 year ago

After some digging, i found out that the updated regex does not work correctly in every case (it even does not work correctly for the error case "|\\"). Here's what i came up with:

I've tried expanding the regex for quoted strings to the following:

(?P<sq>"(?:[^"]|(?<=\\)")*"(?:(?=\))|(?=\s)))

The term_regex would then look like this:

term_regex = r'''(?mx)
    \s*(?:
        (?P<brackl>\()|
        (?P<brackr>\))|
        (?P<num>[+-]?\d+\.\d+(?=[\ \)])|\-?\d+(?=[\ \)]))|
        (?P<sq>"(?:[^"]|(?<=\\)")*"(?:(?=\))|(?=\s)))|
        (?P<s>[^(^)\s]+)
       )'''

It now checks for the closing quote correctly, regardless if the line ends or a closing tag is present. This works without disturbing the other parts of the S-Expression regex as it uses lookahead statements which will not be captured.

I've updated my review comments with this new version of the regex.

I've also added a unittest for this (which currently fails as expected on the master branch), which you may want to merge into your branch for testing, here: https://github.com/mvnmgrx/kiutils/blob/master/tests/test_misc.py#L25-L35 https://github.com/mvnmgrx/kiutils/blob/master/tests/testdata/misc/test_quotesAndBackslashInSexpr

Happy to hear your thoughts about this.

mvnmgrx commented 1 year ago

I've also added a simple unittest for the symbol ID edge cases: https://github.com/mvnmgrx/kiutils/blob/master/tests/test_symbol.py#L124-L134 https://github.com/mvnmgrx/kiutils/blob/master/tests/testdata/symbol/test_symbolIdParser

mvnmgrx commented 1 year ago

I'm merging your PR and doing further work on your changes. Thanks for your contribution!

sAlexander commented 1 year ago

Sorry for going radio silent -- I don't check my github messages nearly often enough! And thanks for merging in the changes (with all of your fixes and tests).

I'm looking forward to using this project more!