mojolicious / mojo-pg

Mojolicious :heart: PostgreSQL
https://metacpan.org/release/Mojo-Pg
Artistic License 2.0
101 stars 46 forks source link

Add expander functionality #18

Closed chy-causer closed 8 years ago

chy-causer commented 8 years ago

Expanding expand functionality

The expand functionality as discussed in a gist has been implemented and is submitted for your consideration. It allows custom expansion of postgresql column types, as well as changing the default Mojo::JSON::from_json, to JSON::XS for example.

Performance

Benchmarking script can be found at https://gist.github.com/chy-causer/66b15f14b16b06d3d9da0e36cd36ca04

Benchmark Before After
expand->hashes 53/s 52/s
expand->hash 42/s 41/s
hashes 90/s 91/s
bool::json 50/s 51/s
Extra expander - 51/s
bool expander - 67/s

These numbers can be verified using the linked script.

From these benchmarks, it's clear there's very little difference in performance using the new code. However, for the not uncommon use case of wanting booleans represented as objects, there is a 30% speed increase as well as a more pleasing syntax.

# One possible method:
$db->query("SELECT row_to_json(t) r from ( select is_bool from _table) t" )->expand->hash->{r};

vs

$db->query("SELECT is_bool from _table")->expand->hash;

Backwards compatibility

All existing tests passed without modification

Pluggability

Functionality is documented and shows how to convert bigints and booleans.

Implementation: An expand mapper

The code makes use of an expand mapper (cached for performance), which maps columns to expand coderefs. The rationale for using this to achieve the desired functionality is two-fold:

  1. Performance
  2. It allows future expansion of expand $results->expand(column_name => sub { ... } )->hash;

I did not pursue the latter as it's not something I need, but the option is there.

Addendum

As it's my first submission to the Mojo project, I am well aware there may be issues in it, in terms of coding style and general best practice. However, there are a few things I would like to bring up in advance:

_expand_mapper mapping both hashes and arrays

The _expand_mapper caches its result for performance. However, it caches both hash and array mappings, whether they're used or not.

This is to match the documentation that once the first row is returned, then you cannot alter the expanders. Consider the following:

$pg->db->query('select * from _table')->expand->hash->expanders({})->array;

This scenario is tested.

What types are available

It turns out the DBD::Pg hard codes the list of row types in its source code. This means that extensions like ip4r return a type of 'unknown' and DBD::Pg will need to be patched and recompiled to deal with new types as they are introduced to your database.

Furthermore, I cannot seem to find a nicely formatted list of available types DBD::Pg returns from $sth->{pg_type}. The best I can do is the DBD::Pg types.c source code!

kraih commented 8 years ago

Pretty sure this breaks JSON decoding for non-blocking queries.

chy-causer commented 8 years ago

I have just spotted a mistake in the bench.pl script. Corrected in the gist, but will need to rerun to regenerate numbers. It should only affect absolute numbers, not ratios, but I'd like to be 100% confident in my results.

chy-causer commented 8 years ago

If this breaks non-blocking queries, then there are missing tests as all existing tests pass.

kraih commented 8 years ago

No, you forgot to add new tests. Until now there was no difference between blocking and non-blocking queries for JSON decoding.

kraih commented 8 years ago

Changing from Mojo::JSON to JSON::XS is a non-issue btw., since Mojo::JSON::MaybeXS exists.

kraih commented 8 years ago

I think a pretty strong argument against this functionality might be that the way DBD::Pg handles different data types is very sketchy. Types like _json and _jsonb are not even mentioned in the documentation, and bigint seems like something DBD::Pg might start handling itself at any time.

kraih commented 8 years ago

Like i said on IRC before, the patch for this feature has to make the expand code cleaner to be worth it for me, and i just don't think that's the case here. So i'm afraid i'll have to give it a 👎 vote.

jberger commented 8 years ago

I haven't given this a ton of review admittedly, since I've been busy at QAH. I do worry that it seems quite complicated, especially given that most of this functionality can be achieved by coercion in the db query itself. Some performance gain of things like a bool expander is nice but perhaps not worth the additional complexity (especially given that other existing expanders see a performance hit). (Note that this is a very common consideration in Mojo-land and not just a comment on this individual PR). As such I'm 👎 at least until the complexity can be minimized.

chy-causer commented 8 years ago
                                                                                  Thank you for looking past the code sloppiness to get to what I am trying to achieve. I do not have time at the moment to prepare a full response so this is an email to say I share some concerns raised, and there are some that I feel can be defended. I will try to reply point by point within the next 24 hours.
jberger commented 8 years ago

@chy-causer note that I've updated my comment a little (to clarify). The sentence is now:

"Some performance gain of things like a bool expander is nice but perhaps not worth the additional complexity (especially given that other existing expanders see a performance hit). "

chy-causer commented 8 years ago

If I have not explicitly addressed any points raised in previous comments, please take that as me not having any response of note.

Benchmarking revisited

The only system I have available to do benchmarking is hosted on a fairly busy VI, so the results are rather noisy. Topping that, there was a fairly howling error in the original benchmarking script and the results were skewed. Here is a new benchmark, with the numbers of iterations and rows boosted, and the script niced to -10 to reduce noise further. I have made no modifications suggested in the comments about wasteful calls.

Benchmark Before After
expand->hashes 52.4/s 52.8/s
expand->hash 36.2/s 36.3/s
hashes 349.1/s 353.3/s
bool::json 95.5/s 102.2/s
Extra expander - 102.4/s
bool expander - 193.8/s

Rows: 2000 Iterations: 5000 (10000 for raw hashes) Before: 290.6s After: 353.7s

As an aside, I now have much more sympathy for the original author of _expand as I can now see that much of its design was to boost performance.

You are welcome to rerun these benchmarks yourself to verify.

https://gist.github.com/chy-causer/66b15f14b16b06d3d9da0e36cd36ca04

_column_types and _columns_by_types being needless

They would be required for the following functionality:

$db->query($QUERY)->expand(column_name => \&expander);

You're not mind-readers, I didn't explain it, and I was getting ahead of myself. I will not pull something like that again.

Of DBD::Pg return types

I wholeheartedly agree that return types are sketchy and unintuitive at times. DBD::Pg's handling can also be fairly inconsistent. Consider the two stanzas and see if you can guess what they return:

$pg->db->query(
  'select array_agg(distinct state) from minion_jobs'
)->array;

$pg->db->query(
  'select array_agg(distinct state)::text[] from minion_jobs'
)->array;

However, on the flip side, the types that do exist, undocumented though they are, have been stable for a number of years now.

https://github.com/autarch/DBD-Pg/commits/master/types.c

int8 is the internal representation of bigint by PostgreSQL and is most likely here to stay. Most other important types are fairly intuitive and I still think there is benefit to be had in expanding them (xml, booleans, timestamps and the one I actually care about, cidr.)

The others like _json I can happily ignore. Arrays of json aren't expanded currently, so I don't see any reason to start.

As a compromise, Mojo::Pg could limit allowed expansion types to a well defined subset of DBD::Pg types (even aliasing the non-intuitive ones like int8.)

Bigint may be handled by DBD::Pg any time

How would it handle it? By returning a Math::BigInt object (the only core module I can think of that would be appropriate)? For arguments sake if they do do that, then Math::BigInt can take Math::BigInt objects as an argument, so nothing will break should that happen. I cannot speak for the developers of DBD::Pg, but I think the likelihood of that happening is small.

What's proposed can be achieved by coercion within the database itself

That's absolutely not the case.

$db->query(
  "select '<hello>world</hello>'::xml"
)->expand->array->[0]->at('hello');

This cannot be done via coercion within the database.

Similarly, even worse, something that you think works, but gives the wrong answer:

$db->query('SELECT 9223372036854775807::bigint')->array->[0] / 10;

The change breaks non-blocking

I'm afraid I cannot write a test that works on the original code but not the new one so I will need some assistance here.

Speed increase of boolean expansion not worth the complexity

Complexity for whom? If it's the Mojo::Pg codebase then I don't have an answer for you other than a promise to go back and look at the code. If it's the end user, then I would argue that

$db->('SELECT true')->expand->array;

Is much less complex than

$db->('SELECT json_agg(true)')->expand->array->[0];

Obviously you wouldn't do these queries in real life, but hopefully you get the picture.

On top of that, the new benchmark shows around 100% performance improvement for the former over the latter (please do verify this for yourself though.)

jberger commented 8 years ago

Just about to go to bed here in England as the QAH wraps up and I fly home. Just before I do go, yes I did mean for the Mojo::Pg codebase. While we certainly care about the end users, we do value the cleanliness of the codebase too. This is not to say that the idea is dead, just that it would need tightening up before it could really be considered. As you have looked through the Mojo code I'm sure you've noticed the style :-P

One other quick thing, this isn't very easy to immediately understand:

$db->('SELECT json_agg(true)')->expand->array->[0];

but I think this reasonably is

$db->('SELECT to_json(true)')->expand->array;

Perhaps you would like to join #mojo on irc.perl.org to discuss further? PRs are nice for reviewing code but to discuss things like api and utility irc is really nice. I'm usually on and around (travel notwithstanding). My nick is jberger. @kraih is sri.

chy-causer commented 8 years ago

I appreciate you want to keep the code clean. As I said earlier, this is my first PR, pretty much for any project, not just Mojo::Pg, so I'm well aware there is much tightening up that can be done and I'm very grateful you and @kraih took the time to read the code and discuss its merits, rather than dismiss it out of hand.

I am on IRC as well, but am only able to respond during UK work hours. My nick is CHYC.

chy-causer commented 8 years ago

$db->('SELECT to_json(true)')->expand->array;

With such a plethora of json functions, I overlooked the most obvious. Thanks

kraih commented 8 years ago

After https://github.com/kraih/mojo-pg/commit/f854bdd4dd9cba3f676eb4a9b5189633e5d1e67c it should not be possible for this patch to be faster than a JSON only solution.

kraih commented 8 years ago

I'm afraid this feature did not get the required votes.