kayak / pypika

PyPika is a python SQL query builder that exposes the full richness of the SQL language using a syntax that reflects the resulting query. PyPika excels at all sorts of SQL queries but is especially useful for data analysis.
http://pypika.readthedocs.io/en/latest/
Apache License 2.0
2.43k stars 293 forks source link

feat: Support limiting row count and offsets in Oracle queries #754

Closed danielenricocahall closed 2 months ago

danielenricocahall commented 9 months ago

Add support for Oracle's syntax for limiting the row count returned in a query by overriding the _limit_sql method in OracleQueryBuilder, as well the offset syntax (_offset_sql).

As these are identical to the MSSQL syntax, I created a new subclass of QueryBuilder called FetchNextAndOffsetRowsQueryBuilder (admittedly not the most clever name) which overrides the two methods, and now have OracleQueryBuilder and MSSQLQueryBuilder subclass from them.

I also overrode the _apply_pagination method in the OracleQueryBuilder to ensure that offset and fetch next are arranged in the correct order for Oracle queries. That override is similar to MSSQLQueryBuilder's implementation, but as Oracle doesn't have the same constraint of a FETCH NEXT requiring an OFFSET _ ROWS, it's not identical, I opted to keep them separated.

coveralls commented 9 months ago

Coverage Status

coverage: 98.396% (+0.006%) from 98.39% when pulling 27d3de97db23d92591e7db5015a5555d53fbc398 on danielenricocahall:feature/support-fetch-first-oracle into 8841520e906970d76c5ed81c7dd5d154f0d5259d on kayak:master.

danielenricocahall commented 9 months ago

Looks good. Thanks for working on this

Absolutely! My pleasure. One observation is that MSSQLQueryBuilder has a similar override:

    @builder
    def fetch_next(self, limit: int) -> "MSSQLQueryBuilder":
        # Overridden to provide a more domain-specific API for T-SQL users
        self._limit = limit

...

    def _limit_sql(self) -> str:
        return " FETCH NEXT {limit} ROWS ONLY".format(limit=self._limit)

Would we want to use that with OracleQueryBuilder for consistency? As in, should I add a fetch_next builder method instead of overriding the limit method?

AzisK commented 9 months ago

To be honest, I am not very well familiar with the code and have myself not used this library so far

Having said that, I see a difference of "FETCH FIRST" and "FETCH NEXT" between these. How will that be handled?

I believe it is not very clear theoretically if OracleQueryBuilder inherits MSSQLQueryBuilder. If they use the same syntax, we could have an abstract class for this but I am not sure how to call it. Probably best to call it by some standard. Or it could be a mixin, for example, FetchLimitMixin

danielenricocahall commented 9 months ago

To be honest, I am not very well familiar with the code and have myself not used this library so far

Having said that, I see a difference of "FETCH FIRST" and "FETCH NEXT" between these. How will that be handled?

I believe it is not very clear theoretically if OracleQueryBuilder inherits MSSQLQueryBuilder. If the use the same syntax, we could have an abstract class for this but I am not sure how to call it. Probably best to call it by some standard. Or it could be a mixin, for example, FetchLimitMixin

Good question - FETCH FIRST and FETCH NEXT are functionally equivalent, at least in Oracle it is ("For the semantic clarity purpose, you can use the keyword ROW instead of ROWS, FIRST instead of NEXT."), so that would be a simple change on my end to the query string.

And yes, a mixin is also exactly what I was thinking (composability > inheritance for this IMO unless MSSQL and Oracle have more in common than we know). I considered adding it, but I wasn't sure if that was overkill or violated any design guidelines within the library. I am open to adding it though - something like:

class FetchNextMixin:
    @builder
    def fetch_next(self, limit: int) -> "MSSQLQueryBuilder":
        self._limit = limit

    def _limit_sql(self) -> str:
        return " FETCH NEXT {limit} ROWS ONLY".format(limit=self._limit)

I think my follow-up question would be if we went that route, how to approach the limit method in these two classes, since they are defined in the base Query class. We could raise a NotImplemented exception but I feel like that's not super developer friendly and violates Liskov Substitution Principle. We could alternatively have the limit method call fetch_next, but at that point, I'm not sure if the change is worth the additional complexity.

danielenricocahall commented 9 months ago

Hey @AzisK , just wanted to follow-up on this. I think we have a few options for a path forward:

I think I'm in favor of the third design.

AzisK commented 8 months ago

I am in favor of 3rd design as well. My arguments that this is beneficial are following. First of all, in this way users can easily switch between data sources and the API will work correctly and accordingly to the source of data or dialect in this sense. Secondly, this library simplifies building SQL queries with the API, so let's keep it simple. Supporting another API keyword for the same functionality but for a different platform makes it more complicated. Thirdly, word "limit" expresses clearly what is being done.

danielenricocahall commented 8 months ago

Hello @AzisK ! I hope all is going well. I made the changes and wound up making a few more around supporting the offset syntax, as detailed in the updated PR description. Please take a look when you get the chance. :smile:

danielenricocahall commented 8 months ago

Hi @AzisK ! Just wanted to follow-up on this.

AzisK commented 8 months ago

Hi @danielenricocahall I will look into it shortly

danielenricocahall commented 7 months ago

Hi @AzisK ! I hope all is going well! Any chance we could merge this soon?

danielenricocahall commented 6 months ago

Hi @AzisK ! I hope all is going well! Any chance we could merge this soon?

AzisK commented 6 months ago

Hi @danielenricocahall. Happy New Year! 😅

I will ask one more person to take a look into it and he is much more active than I am and then things will hopefully go smooth. Sorry for taking so long

@wd60622 could you take a look?

danielenricocahall commented 6 months ago

@wd60622 addressed your comments!

danielenricocahall commented 5 months ago

@wd60622 @AzisK any chance I can get a re-review with the formatting + deprecation warning added? Plus having CI re-run 😅

wd60622 commented 5 months ago

Seems like the tests that ran were good. Is there an issue with the coveralls version @AzisK ?

AzisK commented 5 months ago

I could see this error before as a hiccup of Github Actions. For some reason it does not succeed after a re-run

AzisK commented 5 months ago

Yeah, I believe we can bump the coveralls library and hopefully that will fix it

danielenricocahall commented 5 months ago

@wd60622 @AzisK wanted to follow-up on this one to see if we can re-run now

danielenricocahall commented 4 months ago

@AzisK @wd60622 just following up - was the issue in CI related to coveralls resolved?

danielenricocahall commented 3 months ago

@AzisK @wd60622 just following-up on this inquiry

wd60622 commented 3 months ago

Out of my wheelhouse, sorry. @AzisK will have to take care of it

AzisK commented 3 months ago

For some reason, I don't see an option to rerun...

danielenricocahall commented 3 months ago

Hi @AzisK I just merged master into my branch and it's pending re-running CI. Could you approve?

danielenricocahall commented 2 months ago

@AzisK @wd60622 woohoo it's passing! Can it be merged?

wd60622 commented 2 months ago

@AzisK @wd60622 woohoo it's passing! Can it be merged?

Looks good. Need @AzisK to do the merge 😅