Argument parsing for lists

joelcox commented 11 years ago

There are a few inconsistencies when performing operation on lists:

Non-associative arrays aren't stripped from their indexes, causing the indexes to also be included in the list.
Range commands on lists are not parsed correctly.

Originally reported by Benjamin MA.

joelcox commented 11 years ago

See https://travis-ci.org/joelcox/codeigniter-redis/jobs/3338152

tinkertim commented 11 years ago

The best fix here is to just implement the list related commands. It could be done by storing an array of commands that tells us when we need to treat input differently, but that's .. fragile.

joelcox commented 11 years ago

Correct. Another idea I had was checking whether the keys in the arrayare numerical sequential or not, but that isn't robust either.

tinkertim commented 11 years ago

Well, we're going to have the same problem with sets and sorted sets, as well as pubsub. I'm looking at _encode_request() again now, and it would not be hard to treat certain methods differently. There's about ~15 or so that would need this treatment.

joelcox commented 11 years ago

So adding an array with these edge cases would be the way to go, I guess. I'll take a look at this on monday and add a bunch of unit tests for specific commands. Thanks for your input.

tinkertim commented 11 years ago

I should have a chance to do it tomorrow or Monday if you don't beat me to it.

jbrower commented 11 years ago

Was this ever resolved? I'm wanting to use redis with CI and this looks like a nice clean implementation overall.

joelcox commented 11 years ago

Hey Joseph, @tinkertim has a fix in the works, but I'm not sure if he made any progress due to the holidays.

tinkertim commented 11 years ago

I should have this pushed in the next few days. Sorry about the delays, my job ended just prior to the holidays, and then there were the holidays, so things have been rather frantic.

joelcox commented 11 years ago

No problem Tim!

danhunsaker commented 11 years ago

I think the changes I made in my pull request (Issue #20) have fixed the second half of this issue - that is, the list range commands being parsed incorrectly. That should leave just the first half, where indexes need to be stripped in certain cases. That sound right?

joelcox commented 11 years ago

That is correct Dan. I just pushed some new unit tests to confirm this.

On Jan 3, 2013, at 4:56 PM, Daniel Hunsaker notifications@github.com wrote:

I think the changes I made in my pull request (Issue #20) have fixed the second half of this issue - that is, the list range commands being parsed incorrectly. That should leave just the first half, where indexes need to be stripped in certain cases. That sound right?

— Reply to this email directly or view it on GitHub.

danhunsaker commented 11 years ago

Is there a decent sanity check we can use to determine when to strip indexes (as opposed to passing keys through)? An approach that doesn't require a list of commands that need stripping performed?

danhunsaker commented 11 years ago

(Looking at _encode_request(), it looks like passing these values directly as opposed to in an array would work around this issue in the meantime...)

joelcox commented 11 years ago

One thing that was discussed is treating arrays which are indexed from 0..n as lists, while others are treated as dictionaries (associative arrays). This would be the only way to determine it from the passed in data structure. I'm just worried that people shoot themselves in the foot if we implement it this way.

On Jan 3, 2013, at 10:20 PM, Daniel Hunsaker notifications@github.com wrote:

(Looking at _encode_request(), it looks like passing these values directly as opposed to in an array would work around this issue in the meantime...)

— Reply to this email directly or view it on GitHub.

danhunsaker commented 11 years ago

I see essentially four options, then. Another would be to do nothing, but I'm sure that has already been ruled out, so I'll ignore that one. :smile_cat:

The most robust option is to have an internal list of which Redis commands need to be handled which way. This is difficult for maintenance purposes, though, since the Redis devs can add/remove/change these commands at any time without prior notice, so it probably isn't the best option.
The second option is to assert that the current functionality (passing all keys/indexes to Redis) is the correct one. This would require documentation that forbids the use of arrays for lists, making them only valid for dictionaries. This would also probably break a number of existing applications relying on this library; probably also not what we want.
The third option is to strip numeric indexes, as proposed in previous comments. This is probably the expected operation, but would require the documentation to state that this library only supports dictionaries that require at least one alpha character (I'd recommend [_a-zA-Z], at a minimum) in every key, because numeric indexes will not be sent to Redis. If it's documented, the foot-shooting will be minimized.
The fourth option, and probably the best one, is to provide a configuration option that controls which of these behaviors is used. I'll give a quick starting point of the setup I envision below as a discussion-starter - this is just a rough idea at the moment.
- This value, possibly the default, tells the library to use the third option, above:

$config['redis_dictionaries'] = false;

This value tells the library to use the second option, above (This would also be the fallback value for when this configuration option isn't actually set, since it maximizes backward compatibility.):

$config['redis_dictionaries'] = true;

This value provides a list of Redis commands for which to treat arrays as dictionaries; all commands not listed would then have their keys/indexes stripped, in accordance with the first option, above:

$config['redis_dictionaries'] = array('hmset', 'mset', 'msetnx', ...);

Thoughts?

joelcox commented 11 years ago

I prefer to keep the configurable options down, but I can see its value in this situation. One thing I'm wondering is how many people would actually pass in a numeric (0..n) indexed array into a Redis hash data structure. You could argue that 99% percent are actually using a wrong data structure for their data. What are valid use cases for this?

Another (crazy) option would be to introduce two new classes to represent the different data structures (RedisHash and RedisList, which we can inspect when they are passed in), but that wouldn't be viable, IMHO.

danhunsaker commented 11 years ago

I can't say that I'm aware of any, but I can do some poking around to see if I can find one...

danhunsaker commented 11 years ago

Huh. Well, apparently there's at least one: http://instagram-engineering.tumblr.com/post/12202313862/storing-hundreds-of-millions-of-simple-key-value-pairs

joelcox commented 11 years ago

I guess that wouldn't count. Only arrays which start at index 0 and have as many elements as the highest index - 1 should be treated as sets.

Joël Sent from my phone

On Jan 5, 2013, at 2:31 PM, Daniel Hunsaker notifications@github.com wrote:

Huh. Well, apparently there's at least one: http://instagram-engineering.tumblr.com/post/12202313862/storing-hundreds-of-millions-of-simple-key-value-pairs

— Reply to this email directly or view it on GitHub.

danhunsaker commented 11 years ago

Hrm. So

$strip_keys = false;
reset($args);
if (key($args) == 0)
{
    end($args);
    if (key($args) == count($args) - 1)
    {
        $strip_keys = true;
    }
}

which relies on the sort order being unchanged, or even

$strip_keys = false;
$keys = array_keys($args);
if (min($keys) == 0 && max($keys) == count($args) - 1)
{
    $strip_keys = true;
}

which is sort-tolerant? I can see that. I was, in my mind, treating all numeric keys as indices, regardless of order or value distance. I'd still prefer that people use something like "_155315" instead, though.

On the other hand, would it be more robust to check whether the keys were set as integers versus strings? As opposed to simply checking whether they evaluate numerically? Maybe something crazy like

$strip_keys = false;
if(is_int(key($args)))
{
    $strip_keys = true;
}

or similar (maybe add a reset($args); at the top...)? That ought to differentiate between "155315" and 155315, which would help weed out the intentional uses of numeric keys from the automatic assignment of numeric indices, and also have the benefit of being sort-tolerant.

If we're really paranoid about this, we could even combine approaches:

$strip_keys = false;
$keys = array_keys($args);
if (is_int(key($args)) && min($keys) == 0 && max($keys) == count($args) - 1)
{
    $strip_keys = true;
}

Of course, I could just be over-thinking this. :smiley:

danhunsaker commented 11 years ago

(Regarding that last code fragment, I realized after hitting Comment [I meant to hit Preview!] that

$strip_keys = false;
$keys = array_keys($args);
if (min($keys) === 0 && max($keys) === count($args) - 1)
{
    $strip_keys = true;
}

would perform the same checks with one less condition to process...)

joelcox commented 11 years ago

The last implementation would have my preference. Checking all the keys wouldn't really be feasible, so checking the first and last would be OK.

I suggest holding off on making a decision just now and wait for @tinkertim $0.02. (No rush Tim!)

joelcox commented 11 years ago

I haven't heard from @tinkertim, so I'm proposing to go with solution 3 as suggested by @danhunsaker. If anyone disagrees, please speak up now, or forever hold your peace. :)

I'll let this sit for a few days before implementing.

danhunsaker commented 11 years ago

As far as my $0.02, after stepping away from it for a couple of weeks, I like option 3 a lot, and would vote option 4 as a close second, even if only to keep option 3 around.

joelcox commented 11 years ago

Hey everyone. I pushed the changes we discussed in this thread to the fix-18 branch and it now passes all the units tests again.

I ended up using @danhunsaker last suggestion to check whether we should consider something an associative array or not. I'm not too pleased with how the _encode_request function looks right now and I haven't updated the README yet, but the code is functional.

Let me know if you find any edge cases so the test suite can be expanded if necessary.

joelcox commented 11 years ago

Merged. Thanks all. 81d2a03993dced5a7a5692b0b101523fbbf107c4

joelcox / codeigniter-redis

Argument parsing for lists #18