pydiverse / pydiverse.pipedag

A data pipeline orchestration library for rapid iterative development with automatic cache invalidation allowing users to focus writing their tasks in pandas, polars, sqlalchemy, ibis, and alike.
https://pydiversepipedag.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
19 stars 3 forks source link

Improve insert_into_in_query to treat comments in TSQL select statements #178

Open windiana42 opened 6 months ago

windiana42 commented 6 months ago

I don't think we need a full blown TSQL parser. It might even be a downside to have one because it may change over time. However, it would be nice to support rudimentary comment support in SELECT statement strings given to insert_into_in_query.

The quick route would be to replace /*...*/ multiline, minimum match and --.* until end of line with empty string via regex. However, pyparsing might help to make it more robust against mixed comments: -- /* or /* --.

Here are existing pyparsing SQL parsers. As stated before, I would not suggest to make them more comprehensive to fully understand TSQL. I would rather make them simpler just to detect the start of specific keywords and to ignore bracket blocks (subqueries and quoting) under full understanding of comment status.

https://stackoverflow.com/questions/16909380/sql-parsing-using-pyparsing https://github.com/pyparsing/pyparsing/blob/master/examples/select_parser.py

windiana42 commented 6 months ago

A good start might also be to extend test_sql_ddl.py to multiline statements with various comments.