lpsmith / postgresql-simple

Mid-level client library for accessing PostgreSQL from Haskell
Other
206 stars 71 forks source link

Split Cursor code into separate module #213

Closed BardurArantsson closed 7 years ago

BardurArantsson commented 7 years ago

Ok, here's the cleaned up "add cursor support" patch set.

Any comments/suggestions welcome.

EDIT: The ordering on these commits looks pretty weird, but I think it's just because the timestamps got messed up by rebasing. I shouldn't matter too much -- they are entirely self-contained and should work individually.

BardurArantsson commented 7 years ago

Build failure seems to be caused by a new release of aeson (which I noticed has just shipped) -- nothing to do with me! :)

Maybe there should be some upper bounds -- or maybe support for GHC 7.6 should just be dropped. Given the popularity of Aeson, I doubt people will be happy being restricted to an old Aeson because they're using postgresql-simple...

Anyway: Just wanted to mention that it would probably best to rebase-merge this PR if you decide to go ahead and merge without changes. (The branch name is pretty ugly...)

BardurArantsson commented 7 years ago

Added a couple of fixes to partial pattern matches.

lpsmith commented 7 years ago

At first glance, this looks like a good start, though there are a few things I'd like to tweak. Incidentally, you are already listed in CONTRIBUTORS :) (You could probably find out why by looking in CHANGES.)

Personally, I think that newTempName should be made part of the public API, and it should return an Identifier. We can dispose of some of the manual query generation, as this code was originally written when postgresql-simple wasn't as feature-complete as it is today.

Also, I think the API should be tweaked as follows:

declareCursor :: Connection -> Query -> IO Cursor

declareCursorWithName :: Connection -> Identifier -> Query -> IO Cursor
BardurArantsson commented 7 years ago

Heh, not sure why I didn't notice myself in CONTRIBUTORS :). I'll drop that commit.

I agree about the Identifier vs. Query thing for newTempName -- I considered it, but went with a "minimal" implementation.

I'll do the API tweak too, shouldn't be much of an issue.

BardurArantsson commented 7 years ago

Hum, looking into it a bit more: Changing newTempName to return an Identifier actually looks quite messy. It's also used to generate Savepoint names, and changing that type from Savepoint Query to Savepoint Identifier would break API. (Savepoint constructor is exported from P.S.Types.)

Therefore, I suggest adding a new newTempIdent(ifier) function and not changing the existing code, i.e. only adopting newTempIdentifier in the Cursor module for now.

(It also seems a bit of a detour to start changing Savepoint, etc. for this PR. I mean, I'll happily do it, but perhaps we could do that as follow-up post-merge?)

I also note that there's a potential issue of encoding: Query is a ByteString whereas Identifier is backed by Text -- in this case we know that the identifier is pure ASCII, so we can probably just assume a UTF-8 encoding.

BardurArantsson commented 7 years ago

Actually, thinking a little more about it: How about just leaving "declareCursorWithName" as an internal function for now? That way the ByteString-ness doesn't leak into the API in any way since Cursor is already an opaque type.

That'll leave it completely open for a future change.

I've just now looked into changing Cursor to use Identifier internally, but AFAICT that would necessitate using buildQuery or formatQuery to do the substitution if we want to avoid accidentally using unescaped cursor names. However, trying to use {build,format}Query leads to circular module imports. Dropping the "*withName" variant also neatly avoids since we know that the cursor name is safe to use directly in the SQL.

What are your thoughts?

lpsmith commented 7 years ago

Your two comments make sense to me, and I'm on board with that plan. (And yeah, now you start to see reasons why the codebase has evolved into a mess of circular dependencies...)

lpsmith commented 7 years ago

Also, there probably are legit reasons for providing declareCursorWithName, but at the moment I would have no idea what they would actually be.

BardurArantsson commented 7 years ago

Cool, I'll do what I've described.

Actually, it struck me that it might actually be neater to also just use "Cursor Query Connection" (just like Savepoint does) instead of "Cursor ByteString Connection" -- that should at least keep the internals of the Cursor module a little less cluttered with spurious "Query" constructors/pattern matches.

(And by keeping that fact hidden as suggested by 'hiding' declareWithName', it's still open to fixing later... along with Savepoint which should be fixed exactly the same way.)

BardurArantsson commented 7 years ago

Ok, I think this should be the final form of this PR.

lpsmith commented 7 years ago

did you forget to push some commits? I'm not seeing any new commits at the moment.

BardurArantsson commented 7 years ago

I did it as a fixup to the existing commit.

BardurArantsson commented 7 years ago

Anything else need to happen before this can be merged?

BardurArantsson commented 7 years ago

Ping?

(Sorry if I'm being a nuisance. I'm kind of waiting a bit on this to be upstreamed/released for some of my own stuff.)

lpsmith commented 7 years ago

Sorry, I've been dealing with some complicated and almost unbelievable issues, so life's been extraordinarily maddening lately.

I'll try to take a closer look here soon-ish, though I feel fairly comfortable that this is more or less correct, and that the API is appropriate. You seem to do good work. So please do ping me if you want in the next day or two.

BardurArantsson commented 7 years ago

NP, it can wait a few days :).

(I think I can "vendor" my way through for now while I'm doing pre-release refactoring.)

lpsmith commented 7 years ago

A few comments:

  1. Is there really a reason to make the Internal.PQResultUtils a public module? I can't say I'm overly enthusiastic about that.

  2. The addition of the PQ.CopyBoth and PQ.SingleTuple error codes either needs to bump the minimum version of postgresql-libpq, or add some CPP magic.

  3. Finally, I'm not sure whether or not fetchForward should default to a fold-based interface. I would imagine a lot of people would find having fetchForward return a list or a vector to be more convenient, most of the time. Also, returning a list via the current interface of fetchForward is less efficient than I'd wish, effectively requiring a double reversal when we could just iterate over the results in reverse order.

lpsmith commented 7 years ago

Not sure how to deal with point (3) without overcomplicating the interface, as we probably do need a fold based interface that doesn't allocate an intermediate list or vector.

lpsmith commented 7 years ago

Here's an idea: rename fetchForward* to foldForward*, publish this, and then maybe somebody implements fetchForward as a list or vector result later. And we forget about (3). 1 and 2 should be pretty easy though.

BardurArantsson commented 7 years ago

Ok, so starting from the top:

  1. It's an "internal" module. That's why it's called "*.Internal.*" :) I think this convention is well-enough established at this point that it shouldn't be a surprise to any people who are maintaining upstream packages.

  2. Ok, so either we bump the version, or -- at your leisure -- you can just drop those commits, If you want me to remove them from this PR, I can do that too. Just say the word.

  3. The thing is: I'm not looking to redesign the library quite yet(!) :) . I agree that this is quite a strange/inconvenient interface, but as of this point I'm just moving code about so that it can be called from other places. Ideally, I think the "foldX" family of functions would a) be strict, and b) let the fold terminate early,[1] but I don't think this is the place to make those changes. (We might want to look into making the signatures comapatible with the "foldl" library, at least.)

I'd actually be happy to elaborate even further on point 3 above, but I think it would take us into "new design" space... but, honestly, right now I just want this very small bit of functionality landed. :)

(... and don't get me started on the whole escaping mechanism, etc. :) )

[1] Sort of like 'enumeratees', but maybe we could make do with a little 'foldl'?

BardurArantsson commented 7 years ago

Btw, I'll happily rename fetchForward to foldForward if you want me to. I literally just reused the existing (local) names, I think.

Just let me know what changes you want :).

lpsmith commented 7 years ago

I'd be happy to fix up (1) and (2) myself, and renaming functions is no big deal so I guess I can take care of (3) as well. fetchForward did originally return a list, but then somebody noticed it wasn't really necessary, and got rid of that a while back.

Regarding (1), right, it's clearly an internal module, but it's still publicly exported even if it's not considered part of the public API. I'm just curious if you are actually thinking of using some of that yourself, or that there's another justification for exporting that module.

BardurArantsson commented 7 years ago

I'm not sure I understand, so I'll just ask directly :) : Would you like the module to be non-Internal?

I'm going to be using it in a 3rd party package, so if we can make it an officially supported module then I'm all for that :)

(I'll follow up with some revisions in a day or two.)

lpsmith commented 7 years ago

No, I'm saying why is it exposed, even if it is marked "internal"? Why isn't it in "other-modules?", so you can't import it from outside the package at all?

If you really want that stuff available, I'd probably be somewhat more inclined to throw it into Internal. (ugh, but...)

BardurArantsson commented 7 years ago

Ah. It's just a convention that's arisen in the community about clearly sign-posting that "this API may change at any time -- don't rely on it"... but if you really need to -- then you can. I can't recall the originator of the pattern, but I do know that Edward Kmett is a proponent. (The idea is: Ok, so this may change in the next minor release, but at least I can use it to get my job done today. With other-modules you don't have that recourse.)

The reason I put it in a separate file is simply that "Internal.hs" has grown into something of a beast. (Plus, separate files will keep everybody honest about circular deps :) )

EDIT: See also e.g. https://hackage.haskell.org/package/pipes-4.3.2 and https://hackage.haskell.org/package/conduit for the "Internal" convention. I just picked stream libraries because streaming I/O has been on my mind lately :).

BardurArantsson commented 7 years ago

I've updated the name of the function as requested.

I can see that we might have been talking about past each other: I'd actually like the Cursor module specifically to be exposed (with a stable API). I'd like to use it in one of my own packages, and I note that it could very likely also be used in postgresql-simple-streams (thus avoiding relying on internal modules).

lpsmith commented 7 years ago

We are definitely talking past each other to some extent. The cursor module definitely should be exposed publicly; actually postgresql-simple-streams is close to being able to use the public interface (the obvious exception being the use of temporary names, which should be exposed publicly somehow... but I'm not in a hurry on that count.)

My issue is with Internal.PQResultUtils. My first preference would be to not export it at all, my second preference would be to export those functions from the existing Internal module, and my third preference would be to keep things as they are in this PR. However I can't say my preference is all that strong, so my third preference would ultimately be acceptable.

I am aware of various conventions surrounding Internal modules. Ultimately, I'm ok with exporting that as an internal module, if I am inferring correctly that you want that module available for some of your code. I can't say I'm that enthusiastic about introducing yet another module, yet I can accept it.

BardurArantsson commented 7 years ago

Oh, right -- PQResultUtils. I have no particular reason to want it exposed -- I just did that because it seems to be the convention these days ("consenting adults" and all that). I'm actually fine with moving it to "Other-modules". I think one of us must have mistakenly mentioned PQResultUtils.hs when meaning Cursor... or vice versa :). All I need is access to the Cursor functions.

The problem with putting the finishQueryWith + getRowWith functions in Internal.hs -- I think this is what I tried first, actually -- is that it'll introduce yet further cyclic dependencies since the code originally came from Simple.hs (and is used from there). That's what I wanted to avoid by using a separate module. (I think this should actually probably be done with most of the stuff in Internal.hs, but that's a whole other kettle of PR.)

So maybe the resolution here is just to move `PGit to the "other-modules" list?

BardurArantsson commented 7 years ago

Added a commit to move PQResultUtils to other-modules. Feel free to incorporate that, or not.

lpsmith commented 7 years ago

Thanks!