Closed chy-causer closed 8 years ago
Pretty sure this breaks JSON decoding for non-blocking queries.
I have just spotted a mistake in the bench.pl script. Corrected in the gist, but will need to rerun to regenerate numbers. It should only affect absolute numbers, not ratios, but I'd like to be 100% confident in my results.
If this breaks non-blocking queries, then there are missing tests as all existing tests pass.
No, you forgot to add new tests. Until now there was no difference between blocking and non-blocking queries for JSON decoding.
Changing from Mojo::JSON
to JSON::XS
is a non-issue btw., since Mojo::JSON::MaybeXS
exists.
I think a pretty strong argument against this functionality might be that the way DBD::Pg
handles different data types is very sketchy. Types like _json
and _jsonb
are not even mentioned in the documentation, and bigint
seems like something DBD::Pg
might start handling itself at any time.
Like i said on IRC before, the patch for this feature has to make the expand code cleaner to be worth it for me, and i just don't think that's the case here. So i'm afraid i'll have to give it a 👎 vote.
I haven't given this a ton of review admittedly, since I've been busy at QAH. I do worry that it seems quite complicated, especially given that most of this functionality can be achieved by coercion in the db query itself. Some performance gain of things like a bool expander is nice but perhaps not worth the additional complexity (especially given that other existing expanders see a performance hit). (Note that this is a very common consideration in Mojo-land and not just a comment on this individual PR). As such I'm 👎 at least until the complexity can be minimized.
Thank you for looking past the code sloppiness to get to what I am trying to achieve. I do not have time at the moment to prepare a full response so this is an email to say I share some concerns raised, and there are some that I feel can be defended. I will try to reply point by point within the next 24 hours.
@chy-causer note that I've updated my comment a little (to clarify). The sentence is now:
"Some performance gain of things like a bool expander is nice but perhaps not worth the additional complexity (especially given that other existing expanders see a performance hit). "
If I have not explicitly addressed any points raised in previous comments, please take that as me not having any response of note.
The only system I have available to do benchmarking is hosted on a fairly busy VI, so the results are rather noisy. Topping that, there was a fairly howling error in the original benchmarking script and the results were skewed. Here is a new benchmark, with the numbers of iterations and rows boosted, and the script niced to -10 to reduce noise further. I have made no modifications suggested in the comments about wasteful calls.
Benchmark | Before | After |
---|---|---|
expand->hashes | 52.4/s | 52.8/s |
expand->hash | 36.2/s | 36.3/s |
hashes | 349.1/s | 353.3/s |
bool::json | 95.5/s | 102.2/s |
Extra expander | - | 102.4/s |
bool expander | - | 193.8/s |
Rows: 2000 Iterations: 5000 (10000 for raw hashes) Before: 290.6s After: 353.7s
As an aside, I now have much more sympathy for the original author of _expand as I can now see that much of its design was to boost performance.
You are welcome to rerun these benchmarks yourself to verify.
https://gist.github.com/chy-causer/66b15f14b16b06d3d9da0e36cd36ca04
They would be required for the following functionality:
$db->query($QUERY)->expand(column_name => \&expander);
You're not mind-readers, I didn't explain it, and I was getting ahead of myself. I will not pull something like that again.
I wholeheartedly agree that return types are sketchy and unintuitive at times. DBD::Pg's handling can also be fairly inconsistent. Consider the two stanzas and see if you can guess what they return:
$pg->db->query(
'select array_agg(distinct state) from minion_jobs'
)->array;
$pg->db->query(
'select array_agg(distinct state)::text[] from minion_jobs'
)->array;
However, on the flip side, the types that do exist, undocumented though they are, have been stable for a number of years now.
https://github.com/autarch/DBD-Pg/commits/master/types.c
int8
is the internal representation of bigint by PostgreSQL and is most likely here to stay. Most other important types are fairly intuitive and I still think there is benefit to be had in expanding them (xml, booleans, timestamps and the one I actually care about, cidr.)
The others like _json I can happily ignore. Arrays of json aren't expanded currently, so I don't see any reason to start.
As a compromise, Mojo::Pg could limit allowed expansion types to a well defined subset of DBD::Pg types (even aliasing the non-intuitive ones like int8.)
How would it handle it? By returning a Math::BigInt object (the only core module I can think of that would be appropriate)? For arguments sake if they do do that, then Math::BigInt can take Math::BigInt objects as an argument, so nothing will break should that happen. I cannot speak for the developers of DBD::Pg, but I think the likelihood of that happening is small.
That's absolutely not the case.
$db->query(
"select '<hello>world</hello>'::xml"
)->expand->array->[0]->at('hello');
This cannot be done via coercion within the database.
Similarly, even worse, something that you think works, but gives the wrong answer:
$db->query('SELECT 9223372036854775807::bigint')->array->[0] / 10;
I'm afraid I cannot write a test that works on the original code but not the new one so I will need some assistance here.
Complexity for whom? If it's the Mojo::Pg codebase then I don't have an answer for you other than a promise to go back and look at the code. If it's the end user, then I would argue that
$db->('SELECT true')->expand->array;
Is much less complex than
$db->('SELECT json_agg(true)')->expand->array->[0];
Obviously you wouldn't do these queries in real life, but hopefully you get the picture.
On top of that, the new benchmark shows around 100% performance improvement for the former over the latter (please do verify this for yourself though.)
Just about to go to bed here in England as the QAH wraps up and I fly home. Just before I do go, yes I did mean for the Mojo::Pg codebase. While we certainly care about the end users, we do value the cleanliness of the codebase too. This is not to say that the idea is dead, just that it would need tightening up before it could really be considered. As you have looked through the Mojo code I'm sure you've noticed the style :-P
One other quick thing, this isn't very easy to immediately understand:
$db->('SELECT json_agg(true)')->expand->array->[0];
but I think this reasonably is
$db->('SELECT to_json(true)')->expand->array;
Perhaps you would like to join #mojo on irc.perl.org to discuss further? PRs are nice for reviewing code but to discuss things like api and utility irc is really nice. I'm usually on and around (travel notwithstanding). My nick is jberger. @kraih is sri.
I appreciate you want to keep the code clean. As I said earlier, this is my first PR, pretty much for any project, not just Mojo::Pg, so I'm well aware there is much tightening up that can be done and I'm very grateful you and @kraih took the time to read the code and discuss its merits, rather than dismiss it out of hand.
I am on IRC as well, but am only able to respond during UK work hours. My nick is CHYC.
$db->('SELECT to_json(true)')->expand->array;
With such a plethora of json functions, I overlooked the most obvious. Thanks
After https://github.com/kraih/mojo-pg/commit/f854bdd4dd9cba3f676eb4a9b5189633e5d1e67c it should not be possible for this patch to be faster than a JSON only solution.
I'm afraid this feature did not get the required votes.
Expanding expand functionality
The expand functionality as discussed in a gist has been implemented and is submitted for your consideration. It allows custom expansion of postgresql column types, as well as changing the default Mojo::JSON::from_json, to JSON::XS for example.
Performance
Benchmarking script can be found at https://gist.github.com/chy-causer/66b15f14b16b06d3d9da0e36cd36ca04
These numbers can be verified using the linked script.
From these benchmarks, it's clear there's very little difference in performance using the new code. However, for the not uncommon use case of wanting booleans represented as objects, there is a 30% speed increase as well as a more pleasing syntax.
Backwards compatibility
All existing tests passed without modification
Pluggability
Functionality is documented and shows how to convert bigints and booleans.
Implementation: An expand mapper
The code makes use of an expand mapper (cached for performance), which maps columns to expand coderefs. The rationale for using this to achieve the desired functionality is two-fold:
I did not pursue the latter as it's not something I need, but the option is there.
Addendum
As it's my first submission to the Mojo project, I am well aware there may be issues in it, in terms of coding style and general best practice. However, there are a few things I would like to bring up in advance:
_expand_mapper mapping both hashes and arrays
The _expand_mapper caches its result for performance. However, it caches both hash and array mappings, whether they're used or not.
This is to match the documentation that once the first row is returned, then you cannot alter the expanders. Consider the following:
$pg->db->query('select * from _table')->expand->hash->expanders({})->array;
This scenario is tested.
What types are available
It turns out the DBD::Pg hard codes the list of row types in its source code. This means that extensions like ip4r return a type of 'unknown' and DBD::Pg will need to be patched and recompiled to deal with new types as they are introduced to your database.
Furthermore, I cannot seem to find a nicely formatted list of available types DBD::Pg returns from $sth->{pg_type}. The best I can do is the DBD::Pg types.c source code!