lpsmith / postgresql-libpq

Low-level Haskell bindings for libpq
BSD 3-Clause "New" or "Revised" License
19 stars 18 forks source link

Add notice receiver/processor bindings #25

Closed timmytofu closed 8 years ago

timmytofu commented 10 years ago

By default postgres writes NOTICEs to stderr, this exposes the functions to set a receiver and processor function (per the docs, most people will only overwrite one and do everything in that).

If you merge this, I think postgresql-simple should call disableNoticeReporting by default when connecting - most libraries seem to do this. Right now my test logs are getting inundated with NOTICEs from TRUNCATE… CASCADE statements running in between tests. I can make a pull request for that as well.

I'm not sure if I should be calling freeHaskellFunPtr after importing the old processor/receiver functions, or if returning those is valuable at all. Advice?

lpsmith commented 10 years ago

Thank you, this has been very low in my todo list for some time now. But I've definitely wanted to fix this, and this is certainly in the ballpark of what I had in mind.

I wonder if we shouldn't have a cbits function to use as a no-op, instead of calling back into the Haskell runtime for a no-op. It might be better, but on the other hand, it might not be important.

All of my FFI work with GHC has been calling C from Haskell, not dealing with calling Haskell from C. So I can't really advise at this point in time, but I should probably take a little time to learn about that so I can review this code.

Alternatively, if @joeyadams has done this, and wants to review this patch, that would be fine by me.

lpsmith commented 10 years ago

Ok, one problem with this implementation is that any C function that can potentially result in a notice will have to be marked as a safe FFI call, which would necessitate a fairly careful review of the source code of libpq.

So take getvalue for example, accessing a field that's out of range will, IIUC, result in a notice, though not from the backend. It's marked unsafe at the moment, and since the difference in overhead is fairly substantial for such a small yet fundamental function, I'd hate to be forced to mark it as safe, especially because calling it on out-of-range indexes should never happen when using higher-level bindings. But on the other hand, I think it's unacceptable that calling it on out-of-range indexes could cause a crash of the whole process.

Thus, this approach would force marking some of the existing functions as safe calls when it would be highly undesirable to do so.

lpsmith commented 10 years ago

Come to think of it, the implementation that I believe I had in mind some time ago was actually quite different, in order to avoid calling back into Haskell and avoid issues like the one I mentioned above.

So we do need a no-op implemented in C, and if we want the ability to receive notices in Haskell code, we would have another C function that would store the result pointer in a buffer. We can then implement a Haskell function that will fetch the next notice out of the buffer, if there is one there.

Of course, there's several semantic issues to consider: one might imagine a fetchNotice :: Connection -> IO Result function that blocks and can operate on a connection concurrently, not unlike the postgresql-simple's getNotification, but it seems impossible to efficiently implement such a function at the level of postgresql-libpq using the buffer strategy, unless we allow callbacks into Haskell. (We can't really use the getNotification strategy of polling after data is received on the connection, because libpq itself can trigger the callback. Alternatively, we could set up a self-pipe, and write to the pipe from C and read from the pipe in Haskell. I don't know how well this would work on Windows though, and it seems silly to use another 2 file descriptors per connection.)

So with the buffer strategy, it seems the most reasonable semantics for the lowest-level Haskell function that fetches notices is a stateful interaction, returning an IO (Maybe Result).

We could potentially even not export this function from postgresql-libpq, if all the bindings would check the buffer and say, run a Haskell call-back on any notices. Or we could expose this function and punt this issue up a level to postgresql-simple and other high-level libraries. The first approach would have the advantage that one could use this new functionality with any unmodified higher-level library that allows raw libpq calls. The second approach could avoid the need to turn the connection from a newtyped ForeignPtr into a ForeignPtr, Buffer pair.

The next question would be whether or not we use a fixed-size or growable buffer, and if we use a fixed-size buffer, how big should it be and what happens if the buffer is full? Should we throw out the newest notice, the oldest notice, or something else entirely? I guess I'd be inclined to go with a fixed-size circular buffer, that would throw away the oldest unclaimed notice.

If all you really want is the ability to suppress the notices, implementing that alone would be very easy with some cbits code. But as it stands, I don't think I can accept this pull request.

lpsmith commented 10 years ago

Ok, to further elaborate on the potential API, I'm thinking the simplest thing to do would be to have a few functions:

ignoreNotices :: Connection -> IO ()
receiveNotices :: Connection -> IO NoticeBuffer
getNotice :: NoticeBuffer -> IO (Maybe Result)

Alternatively, we could make a connection a pair:

data Connection = Connection { fp :: !(ForeignPtr PGConn),  noticeBuffer :: !NoticeBuffer }

ignoreNotices :: Connection -> IO ()
receiveNotices :: Connection -> IO ()
getNotice :: Connection -> IO (Maybe Result)

I don't really like this approach, because IIRC a ForeignPtr can't be truly unpacked, resulting in another level of indirection. (But please correct me if I'm wrong.) However, it would allow us to mimick the callback API by modifying postgresql-libpq to call getNotice after any call to libpq that might result in a notice; we wouldn't even need to export NoticeBuffer and getNotice, which in turn wouldn't force higher-level bindings to specifically support this feature.

lpsmith commented 10 years ago

Actually, apparently unpacking a ForeignPtr does remove a layer of indirection... so I'm not necessarily so opposed to the emulated callback approach.

lpsmith commented 10 years ago

Ok, I hacked up a quick proof of concept in 62322e1b31a4f84adbdca86a86b4a6067370d2b6, and smoke tested it:

GHCi, version 7.6.3: http://www.haskell.org/ghc/  :? for help
> import Database.PostgreSQL.LibPQ 
> c <- connectdb ""

> exec c "select 1 + 1" >>= \(Just r) -> getvalue r 0 1
column number 1 is out of range 0..0
Nothing

> disableNoticeReporting c
> exec c "select 1 + 1" >>= \(Just r) -> getvalue r 0 1
Nothing

> nb <- enableNoticeReporting c
> exec c "select 1 + 1" >>= \(Just r) -> getvalue r 0 1
Nothing

> getNotice nb
Just "column number 1 is out of range 0..0\n"

It's definitely not release-worthy quite yet, but it's a start. While I implemented my first concrete interface proposal above, mostly because it involved the fewest changes, I think I'm inclined to go with the second proposal. I'm still undecided on whether or not postgresql-libpq should implement the callback interface.

timmytofu commented 10 years ago

I won't have time to work on this again until this weekend at the earliest, I'm afraid.

The cbits function seems reasonable.

If you want to go the second route and implement the callback interface, what kind of visible API would you support for notice callbacks? ignoreNotices, which short circuits the notices as far upstream as possible, setNoticeReceiver which gets the bytestrings , and receiveNotices and getNotice are internal?

If it didn't implement the callback interface, users of postgresql-libpq would be expected to call getNotice periodically - on each execution, I guess, where execution is either a call to exec or some larger concept that library has of what its "execution" is. Is there a benefit to getting notices one at a time? Notices are attached to a connection, not a result, so it seems like the higher library would only want to get all or nothing (with the understanding that if you don't call this, the notices in there might not be relevant to the most recent query. That may in fact be a caveat regardless.)

lpsmith commented 10 years ago

Ok, in 2b1e1e674d06040ed78536da272c33f256a3d3fa I went ahead and implemented my second API proposal.

If you want to go the second route and implement the callback interface, what kind of visible API would you support for notice callbacks? ignoreNotices, which short circuits the notices as far upstream as possible, setNoticeReceiver which gets the bytestrings , and receiveNotices and getNotice are internal?

Yup, that's basically what I had in mind. I do think special-casing ignoreNotices would be a good idea, as perhaps would be writing a cbits callback that would write notices to some arbitrary file descriptor. (dprintf seems appropriate, although it's originally a GNU extension that's now part of the POSIX.1-2008 standard. So I guess we'd have to do a little bit more research into where exactly it is and isn't supported, and then decide whether or not to go with it or use write and select or some other implementation, or maybe even use a bit of conditional compilation.)

As it turns out, the PGresult * passed to the notice receiver is no longer valid after the C callback completes, so the store_notices callback does have to malloc, in which case the circular buffer approach I suggested and implemented seems far less attractive. Perhaps we should use a linked list of notices, something along the lines of

typedef struct PGnotice {
   struct PGnotice * next;
   size_t len;
   char str[];
} PGnotice;

I'm still undecided about whether or not reimplementing the callback interface is a good idea. Yes, notices are not explicitly attached to a result via libpq, but they (usually, always?) are the result of a particular call to libpq or message to the backend, which the callback interface hides. (One exception to the rule may be when an administrator forces a shutdown of the backend, which I think results in a notice but I'm not sure.)

Also, I'm not sure that having postgresql-libpq check for notices after every call to getvalue is a good idea either. Higher level bindings may be able to do a little better job with this, and I also think my assertions that higher-level bindings would need to explictly support notice processing with the getNotice interface are a little bit suspect. Now I'm tending towards believing that getNotice would be pretty usable with postgresql-simple as-is, via the Internal module.

lpsmith commented 9 years ago

Ok, I did finally release what I have so far at the moment to Hackage, what do you think?

lpsmith commented 8 years ago

Closing this as resolved. Thank you for providing some impetus to get this resolved!