joyfullservice / msaccess-vcs-addin

Synchronize your Access Forms, Macros, Modules, Queries, Reports, and more with a version control system.
Other
211 stars 41 forks source link

SqlFormatter: Endless loop if the SQL text contains "\" - "Select 'xxx\' & FileName ...." #442

Closed josef-poetzl closed 11 months ago

josef-poetzl commented 1 year ago

Test code:

Const SqlString = "Select 'xxx\' & FileName As FullFileName From FileTable"

With New clsSqlFormatter
   Debug.Print .FormatSQL(SqlString)
End With

The loop is produced in GetQuotedString, because Regexp does not return a match.

Pattern:

            ' (4) single quoted string using '' or \' to escape
            .Add "((\'[^\'\\\\]*(?:\\\\.[^\'\\\\]*)*(\'|$))+)"      ' sx',

Why is escape with \ required?

joyfullservice commented 1 year ago

Thanks for reporting this! This code was ported from a PHP library that is probably most often dealing with the MySQL dialect, which uses the backslash to escape a literal character.

There are two observations that I am making here... First, I should add a check to ensure that the tokenizing is advancing at each loop iteration. This would prevent the endless loop in this or any other similar situations.

Secondly, I think I probably should provide a dialect hint as an optional parameter to the public formatter function. Most of the time when we are parsing source code we should be aware of the dialect in use. (For example, native Access queries are going to use Access query syntax, while the schema classes should prefer their own respective dialects.)

I don't want to overcomplicate the SQL formatter class, but I think this is a legitimate case where a dialect hint could be integrated into a case statement so that the backslash escape is only applied to MySQL.

joyfullservice commented 1 year ago

I have pushed a couple changes that seems to have resolved this issue. If you build the dev branch from source, you can try this in your environment. I am not a RegEx expert, but hopefully I made these tweaks correctly... 😄 (If not, feel free to point out any adjustments that should be made.)

josef-poetzl commented 1 year ago

Thanks, tested with "Select 'xxx\' & FileName As FullFileName, 'a''bc' as T From FileTable" and works well.

josef-poetzl commented 1 year ago

A suggestion for a small refactoring that simplifies a later extension with SQL dialects. (GetQuotedStringPatterns could be outsourced to a SqlDialect class if needed):

Private Function GetQuotedString(Optional lngStartOffset As Long = 0) As String
[...]
   With New clsConcat
        ' Build out RegEx expression
        .Add "^("
            .Add Join(GetQuotedStringPatterns(), "|")   '<--- (SRP)
            ' Suggestion: extension for clsConcat:
            '.AddArray GetQuotedStringPatterns(), "|"
        .Add ")"
        strExp = .GetStr
    End With

Details: https://github.com/josef-poetzl/msaccess-vcs-addin/commit/062b8f7ce514cc71e6054dd27cf68d7e132cbe25 Note: I set the pattern for ` only for MySql in this modification. I think this is only MySQL specific.

joyfullservice commented 1 year ago

Thanks for the suggestions! It looks like you are correct that the backtick is only used with MySQL. It also appears that the square bracket is not used in MySQL, so I have reorganized the function a bit. I reviewed the GetQuotedStringPatterns approach, and while I like the concept (and using Join to link them together) it does add a bit of complexity that might not be needed at this point. (More lines of code, additional function to understand) I decided to just keep it simple for now and leave the Select Case in the original function. (Just personal preference, nothing wrong with your example.) 😄

Thanks again for the input! 👍

hecon5 commented 1 year ago

related to #184

hecon5 commented 11 months ago

@joyfullservice @josef-poetzl is this still a bug, or has it been resolved?

josef-poetzl commented 11 months ago

For me it's resolved - see comment from September 27.