limweb / sabreamf

Automatically exported from code.google.com/p/sabreamf
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Bug in handling sparse associative arrays. #14

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
We use a lot of "sparse" arrays in our project, for example:

$data = array( 
              10 => 'Val1',
              20 => 'Val2',
              100 => 'Val3',
);

Unfortunately when trying to pass this back to Flex through SabreAMF, it
gets mangled into an array that looks like this:

$data = array(
             0 => NULL,
             1 => NULL,
             ...
             10 => 'Val1',
             11 => NULL
             ...
             99 => NULL,
             100 => 'Val3'
);

I've narrowed it down to the for() loop in AMF0/Serializer.php, as its
looping based on the value of the last key found in the array instead of
the actual number of elements in the array. 

Original issue reported on code.google.com by mike.ben...@gmail.com on 8 Sep 2009 at 4:51

GoogleCodeExporter commented 9 years ago
Here is a small patch to apply to AMF0/Serializer.php that should fix the 
issue, I
would imagine it would be marginally faster too:

Index: Serializer.php
===================================================================
--- Serializer.php      (revision 2789)
+++ Serializer.php      (working copy)
@@ -145,13 +145,10 @@
             if (!count($data)) {
                 $this->stream->writeLong(0);
             } else {
-                end($data);
-                $last = key($data);
-                $this->stream->writeLong($last+1);
-                for($i=0;$i<=$last;$i++) {
-                    $item = isset($data[$i])?$data[$i]:NULL;
-                    $this->writeAMFData($item);
-                }
+                $this->stream->writeLong(count($data));
+               foreach( $data as $k=>$v ) {
+                    $this->writeAMFData($v);
+               }
             }

         }

Original comment by mike.ben...@gmail.com on 8 Sep 2009 at 5:01

GoogleCodeExporter commented 9 years ago
Hi Mike,

In your previous bug report you were using AMF0, but this fix is for the AMF3
serializer. Is this correct? (just making sure).

The behaviour you're describing is unfortunately a design choice. This is how 
PHP
receives sparse arrays from Flash, so I wanted to send it back in the exact 
same way.
If I were to implement your patch, the array keys would be lost. (e.g. in your
example 10 becomes 0, 20 becomes 1). 

You can fix this on PHP's end, by making sure you are calling array_values() on 
the
arrays you're returning. This should also keep your code compatible with 
AMFPHP, if
that's one of your design goals.

Original comment by evert...@gmail.com on 8 Sep 2009 at 2:31

GoogleCodeExporter commented 9 years ago
This fix is for the AMF0/Serializer.php

AMFPHP handles sparse arrays without any issue or additional work whatsoever, 
as we
are currently in the process of migrating from AMFPHP to SabreAMF in our 
project. So
any issues that arise are strictly differences between the two.

I do see that SabreAMF appears to be receiving sparse arrays incorrectly as 
well as
sending them incorrectly now. Which again is something that AMFPHP handles 
without
problems, so we will need to figure out a patch to fix that as well.

Original comment by mike.ben...@gmail.com on 8 Sep 2009 at 3:31

GoogleCodeExporter commented 9 years ago
I just read AMF0 spec again, and you are partly right. Sparse arrays must still 
be
encoded in the same way, but instead of encoding NULL, I'm supposed to encode
'undefined'. 

This _should_ result in the sparse array you're expecting.

I committed a fix as r240. Would you mind giving that a shot?

Original comment by evert...@gmail.com on 8 Sep 2009 at 5:28

GoogleCodeExporter commented 9 years ago
Sorry, I'm still learning the SabreAMF code, but yes my patch is completely 
wrong
upon further testing.

Do you have some ideas as to what would be the proper way to preserve the exact 
data
passed through SabreAMF? The original behavior still seems really strange to me.

Original comment by mike.ben...@gmail.com on 8 Sep 2009 at 5:52

GoogleCodeExporter commented 9 years ago
There is an error in your patch, after the isset() you reference $item, but 
$item is
no longer set. I changed it to $data[$i], but that doesn't seem to change the 
end
result, which is an array appearing in Flex with 100 elements, 96 of which are 
NULL.

Original comment by mike.ben...@gmail.com on 8 Sep 2009 at 6:09

GoogleCodeExporter commented 9 years ago
Looking through the AMFPHP code, it appears to detect such sparse arrays and 
handle
them as mixed arrays instead:

Here is snippet of their writeArray() function:

               $numeric = array(); // holder to store the numeric keys
                $string = array(); // holder to store the string keys
                $len = count($d); // get the total number of entries for the array
                $largestKey = -1;
                foreach($d as $key => $data) { // loop over each element
                        if (is_int($key) && ($key >= 0)) { // make sure the keys are
numeric
                                $numeric[$key] = $data; // The key is an index in an
array
                                $largestKey = max($largestKey, $key);
                        } else {
                                $string[$key] = $data; // The key is a property of an
object
                        }
                }
                $num_count = count($numeric); // get the number of numeric keys
                $str_count = count($string); // get the number of string keys

                if ( ($num_count > 0 && $str_count > 0) ||
                         ($num_count > 0 && $largestKey != $num_count - 1)) { // this
is a mixed array

                        $this->writeByte(8); // write the mixed array code
                        $this->writeLong($num_count); // write  the count of items in
the array
                        $this->writeObjectFromArray($numeric + $string); // write the
numeric and string keys in the mixed array
                } else if ($num_count > 0) { // this is just an array

                        $num_count = count($numeric); // get the new count

                        $this->writeByte(10); // write  the array code
                        $this->writeLong($num_count); // write  the count of items in
the array
                        for($i = 0 ; $i < $num_count ; $i++) { // write all of the
array elements
                                $this->writeData($numeric[$i]);
                        }

Original comment by mike.ben...@gmail.com on 8 Sep 2009 at 6:53

GoogleCodeExporter commented 9 years ago
I fixed the problem to now pass undefined.

The AMFPHP solution makes sense, but I feel this shouldn't be changed for 
SabreAMF.
This will otherwise break compatibility for existing deployments, which is a 
non-option.

AMF0 spec, for your reference:

http://download.macromedia.com/pub/labs/amf/amf0_spec_121207.pdf

This includes some information about how to deal with sparse arrays.

Original comment by evert...@gmail.com on 8 Sep 2009 at 7:58

GoogleCodeExporter commented 9 years ago
What purpose does having UNDEFINED elements in an array serve? Do people really 
make
use of them somehow? 

Would you be willing to add an option that can enable/disable this 
functionality,
with it defaulting to off of course?

The AMF3 serializer has the same issue.

It seems to me if someone wanted UNDEFINED elements in their array they would 
add
them, and if they don't, they would want to use a mixed array automatically.

Original comment by mike.ben...@gmail.com on 8 Sep 2009 at 11:33

GoogleCodeExporter commented 9 years ago
Just to add more to the discussion, Zend_AMF behaves the same as AMFPHP, but it
appears to be take a slightly better approach codewise:

               case (is_array($data)):
                    // check if it is an associative array
                    $i = 0;
                    foreach (array_keys($data) as $key) {
                        // check if it contains non-integer keys
                        if (!is_numeric($key) || intval($key) != $key) {
                            $markerType = Zend_Amf_Constants::AMF0_OBJECT;
                            break;
                            // check if it is a sparse indexed array
                         } else if ($key != $i) {
                             $markerType = Zend_Amf_Constants::AMF0_MIXEDARRAY;
                             break;
                         }
                         $i++;
                    }
                    // Dealing with a standard numeric array
                    if(!$markerType){
                        $markerType = Zend_Amf_Constants::AMF0_ARRAY;
                        break;
                    }
                    break;

If two of the more (most?) popular AMF frameworks handle sparse arrays as mixed
arrays in AMF, it would seem to me that the behavior would work quite well, no?

Original comment by mike.ben...@gmail.com on 9 Sep 2009 at 3:48

GoogleCodeExporter commented 9 years ago
Hey Mike,

In reply to:

What purpose does having UNDEFINED elements in an array serve? Do people really 
make
use of them somehow? 

Please refer to the official AMF0 specification on how to encode sparse arrays.
You'll see that I'm following the spec by using undefined.

That being said, if both implementations use a mixed array instead, I'm cool 
with
accepting a patch that mimics this behaviour.

Are you able to provide this?

Original comment by evert...@gmail.com on 9 Sep 2009 at 10:23

GoogleCodeExporter commented 9 years ago
I'm not suggesting that we break the spec at all, but from what I read the spec
doesn't explicitly state that sparse arrays are required to be encoded as a 
regular
array, instead of a mixed array. 

It definitely adds a performance regression by trying to detect sparse arrays, 
but
I'm curious what the downside would be to perhaps always using a mixed array 
instead
of a regular array? It looks like there would be slightly more network bandwidth
used, but what regressions would it cause from a Flex/ActionScript point of 
view?

Nonetheless, I will work on providing a proper patch to implement 
AMFPHP/Zend_AMF
equivalent behavior.  

Original comment by mike.ben...@gmail.com on 9 Sep 2009 at 3:42

GoogleCodeExporter commented 9 years ago
I'm not terribly worried about the performance impact here. There's already a 
loop
that checks for non-numeric keys, which could be extended to also check for 
gaps in
the array indexes. 

Looking forward to the patch! Sorry for giving you a bit of a hassle, but 
backwards
compatibility is important to me, so I'm glad you did the research.

Original comment by evert...@gmail.com on 9 Sep 2009 at 3:57

GoogleCodeExporter commented 9 years ago
How does this look for the AMF0 serializer? Its basically a similar method to 
how
Zend_AMF does it. 

Index: AMF0/Serializer.php
===================================================================
--- AMF0/Serializer.php (revision 2799)
+++ AMF0/Serializer.php (working copy)
@@ -54,16 +54,16 @@

                 // Checking if its an array
                 if (!$type && is_array($data))   {
-
                     // Looping through the array to see if there are any
                     // non-numeric keys
+                    $i = 0;
                     foreach(array_keys($data) as $key) {
-                        if (!is_numeric($key)) {
-                            // There's a non-numeric key.. we'll make it a 
mixed
-                            // array
+                        if (!is_numeric($key) || $key != $i || intval($key) != 
$key) {
+                            // There's a non-numeric key or sparse array... 
we'll
make it a mixed array
                             $type = SabreAMF_AMF0_Const::DT_MIXEDARRAY;
                             break;
                         }
+                        $i++;
                     }

                     // Pure array

Original comment by mike.ben...@gmail.com on 9 Sep 2009 at 6:03

GoogleCodeExporter commented 9 years ago
Here is the same change to the AMF3 serializer:

--- AMF3/Serializer.php (revision 2789)
+++ AMF3/Serializer.php (working copy)
@@ -238,9 +238,15 @@
             // We need to split up strings an numeric array keys
             $num = array();
             $string = array();
+            $i = 0;
             foreach($arr as $k=>$v) {
-                if (is_int($k)) $num[] = $v; else $string[$k] = $v;
+                if (!is_numeric($k) || $k != $i || intval($k) != $k) {
+                    $string[$k] = $v;
+                } else {
+                    $num[] = $v;
             }
+                $i++;
+            }

             unset($arr);

Original comment by mike.ben...@gmail.com on 9 Sep 2009 at 6:32

GoogleCodeExporter commented 9 years ago
I did some quick benchmarking of just this block of code (AMF3), and 
unfortunately it
takes my quad core 2.5Ghz development box over 320ms to split a 100,000 element 
array
into string/numeric keys when all the keys are numeric, and about 280ms when 
they are
all strings and about 290-300 when there is a 50/50 split. 

To put that into perspective it only takes about 60ms to generate the array for 
this
test to begin with. (that time is not included in the numbers above of course)

In the worst case scenario my above patch is about twice as slow as the 
existing code. 

Because we deal with fairly large datasets, I'm inclined to run some further 
tests as
to the down sides of forcing all arrays to be mixed, eliminating this entire 
loop
completely.

Original comment by mike.ben...@gmail.com on 9 Sep 2009 at 6:45

GoogleCodeExporter commented 9 years ago
After some more benchmarking and research, I think the attached patch for the 
AMF3
serializer is the fastest method I can come up with while supporting both mixed 
and
pure arrays.

Its much bigger as it adds an additional function, and handling pure arrays is 
the
slowest code path (I don't think there is anyway around this), but overall it 
should
actually be at least twice as fast as the original code even in the worst case
scenario, and best case many times faster. It also uses less memory, because it
doesn't need to copy elements into separate string/numeric arrays then loop 
through
them again.

Original comment by mike.ben...@gmail.com on 9 Sep 2009 at 11:24

Attachments:

GoogleCodeExporter commented 9 years ago
I added the patch for AMF0. 

http://code.google.com/p/sabreamf/source/detail?r=242

Still reviewing the patch for AMF3. 

Original comment by evert...@gmail.com on 10 Sep 2009 at 3:46

GoogleCodeExporter commented 9 years ago
Thanks, if you approve the AMF3 method, I can submit a refactored and cleaned up
patch to allow AMF0 and AMF3 to both use isAssocArray, keeping things 
consistent and
fast across both protocol versions.

BTW: There is a small bug in isAssocArray where it should use OR instead of AND 
in
its IF statement, it also speeds it up a few percentage points.

Original comment by mike.ben...@gmail.com on 10 Sep 2009 at 5:34

GoogleCodeExporter commented 9 years ago
This is the fastest way on average to determine for numbered, associative, and 
sparse arrays:

$isAssoc = array_keys($data) !== range(0, count($data) - 1);

Original comment by msthorn...@gmail.com on 17 Sep 2009 at 7:23

GoogleCodeExporter commented 9 years ago
AMF3 doesn't support mixed arrays, so we changed it to use objects. See 
attached.

Original comment by msthorn...@gmail.com on 17 Sep 2009 at 7:27

Attachments:

GoogleCodeExporter commented 9 years ago
I tried this line of code in my benchmark:

$isAssoc = array_keys($data) !== range(0, count($data) - 1);

Its 4x slower then the patch I sent you for an array of 100,000 entries, about 
20%
slower for an array with 10 entries:

This is how I produced the array that I tested in the benchmark:

for($i=0; $i < 100000; $i++ ) {
    if ( $i%2 == 0 ) {
//    if ( $i > -1 ) {
        //String
        $arr[$i.'A'] = 1;
    } else {
        //Numeric
        $arr[$i] = 1;
    }
}

That creates a 50/50 split of numeric based keys and string based keys, then 
you can
comment out the first IF statement and use the 2nd one to create 100% numeric 
or 100%
string based for additional testing.

I'm curious what sort of benchmarks you ran that showed the opposite results?

Original comment by mike.ben...@gmail.com on 17 Sep 2009 at 9:10

GoogleCodeExporter commented 9 years ago
Regarding the Serializer.php file "msthornton" provided, I fail to see how it
improves things at all. 

1. You say AMF3 doesn't support mixed arrays, yet SabreAMF currently provides 
that
functionality, along with AMFPHP and Zend_AMF, and it seems to work fine.

2. You changed it so associative arrays are converted into objects which both 
slows
it down and breaks backwards compatibility.

3. You left writeArray() untouched, where it still tries to determine if arrays 
are
associative or not and handles them differently if they are.

Original comment by mike.ben...@gmail.com on 17 Sep 2009 at 9:25

GoogleCodeExporter commented 9 years ago
Mike, I also committed the patch you provided for AMF3. 

Let me know how that works for you. I fully agree with Mike related to encoding 
mixed
arrays as objects, AMF3 indeed supports mixed arrays, and although both your
solutions break backwards compatibility to a degree, mike's solution is less 
intrusive.

Encoding sparse arrays as associated arrays means that the elements within the 
array
are now encoded as strings, rather than numeric keys. I'm hoping the effects of 
this
are minimal, but since AS3 employs strict typing, this still worries me quite a 
bit.

I hope you can the AMF3 patch, and make sure the regular looping constructs 
still
work as expected. 

Original comment by evert...@gmail.com on 17 Sep 2009 at 10:59

GoogleCodeExporter commented 9 years ago
Mike, I also committed the patch you provided for AMF3. 

Let me know how that works for you. I fully agree with Mike related to encoding 
mixed
arrays as objects, AMF3 indeed supports mixed arrays, and although both your
solutions break backwards compatibility to a degree, mike's solution is less 
intrusive.

Encoding sparse arrays as associated arrays means that the elements within the 
array
are now encoded as strings, rather than numeric keys. I'm hoping the effects of 
this
are minimal, but since AS3 employs strict typing, this still worries me quite a 
bit.

I hope you can the AMF3 patch, and make sure the regular looping constructs 
still
work as expected. 

Worth noting, this is how AMF3 generates it's arrays:

        foreach ($array as $key => $value) {
            if (is_int($key)) {
                $numeric[] = $value;
            } else {
                $string[$key] = $value;
            }
        }

Original comment by evert...@gmail.com on 17 Sep 2009 at 11:01

GoogleCodeExporter commented 9 years ago
Mistake in my last sentence. Should have been: This is how Zend_AMF generates 
arrays
in AMF3

Original comment by evert...@gmail.com on 17 Sep 2009 at 11:02

GoogleCodeExporter commented 9 years ago
Also OMG can I just express my brief frustration on how Zend_AMF is just a 
big-ass
copy-paste of SabreAMF. Surely it's a bit cleaner, and naming conventions are a 
bit
nicer, but the entire overall structure is a blatant copy.

First time I really looked at the code, I never realized it was that bad.. 

Original comment by evert...@gmail.com on 17 Sep 2009 at 11:07

GoogleCodeExporter commented 9 years ago
Yeah, I noticed that Zend_AMF's code looked extremely similar to SabreAMF, I 
found
Zend_AMF to be a nightmare to work with though. If you don't start your project 
from
scratch using Zend_AMF then good luck ever getting it to work. 

I'll checkout the latest revision and do some more testing. 

Original comment by mike.ben...@gmail.com on 17 Sep 2009 at 11:21

GoogleCodeExporter commented 9 years ago
evertpot: I see the changes you made to AMF3 are pretty much identical to AMF0. 
Did
you get a chance to look at my patch from comment #17 at all? 

Performance is a huge issue for us, so its several times faster than the 
original
AMF0 patch (and existing stable versions of SabreAMF) I submitted with the only 
side
affect of treating mixed arrays as 100% mixed and pure arrays as 100% pure. 

I think it could be argued that the existing behavior is relatively undefined 
when it
comes to arrays with both integer and string keys, especially when it comes to
ordering. Treating them as 100% mixed arrays should in theory make things more
consistent.

Original comment by mike.ben...@gmail.com on 17 Sep 2009 at 11:33

GoogleCodeExporter commented 9 years ago
Mike: I also checked the last patch; looks good to me .. I committed it, 
slightly
modified (conventions and naming)

I'll test it during the day tomorrow. It's getting a bit late here :)

Original comment by evert...@gmail.com on 17 Sep 2009 at 11:49

GoogleCodeExporter commented 9 years ago
Excellent, I'll submit a refactoring patch next week hopefully to clean things 
up
just a slight bit more and so AMF0 shares more code with AMF3. It shouldn't 
change
the behavior at all.

Original comment by mike.ben...@gmail.com on 18 Sep 2009 at 2:53

GoogleCodeExporter commented 9 years ago
Sounds good!

Original comment by evert...@gmail.com on 18 Sep 2009 at 10:01

GoogleCodeExporter commented 9 years ago
Attached is the patch to factor out isPureArray to the main serializer class so 
it
can be used by both AMF0 and AMF3 in the same way. It also includes a minor 
change
(okay, I admit I can be picky sometimes) to the AMF3 writeArray function so the 
IF
statement is a positive rather then negative check.

No behavior should change due to this patch, other than making AMF0 be 
consistent
with the way AMF3 works.

Original comment by mike.ben...@gmail.com on 18 Sep 2009 at 6:01

Attachments:

GoogleCodeExporter commented 9 years ago
Sorry, I jumped the gun and was fixing a symptom of a different problem. Your 
fix
works much better, thanks!

Original comment by msthorn...@gmail.com on 18 Sep 2009 at 7:12

GoogleCodeExporter commented 9 years ago
Mike: Committed the latest patch

Original comment by evert...@gmail.com on 22 Sep 2009 at 7:10

GoogleCodeExporter commented 9 years ago
So I was talking about different issues. I'm mainly focusing on AMF3. The 
isPureArray
has a bug for detecting sparse arrays, and it can be quite slow with large sets 
of
int arrays. I've attached a fix and tests with several approaches to 
performance. The
Hybrid approach turns out to be the best in all cases. Here are my results:

Running tests for isPureArray...
  - isPureArray          with MixedArray          : 1.0467
  - isPureArray          with IntArray            : 6.7537
  - isPureArray          with SparseIntArray      : 7.6607
  - isPureArray          with StringArray         : 1.0184
  - isPureArray          with MixedAfterHalfArray : 3.9293
  - isPureArray          with SparseAfterHalfArray: 5.3279
 Running validity tests...
  - isPureArray          with MixedArray           validity test PASSED
  - isPureArray          with IntArray             validity test PASSED
  - isPureArray          with SparseIntArray       validity test FAILED
  - isPureArray          with StringArray          validity test PASSED
  - isPureArray          with MixedAfterHalfArray  validity test PASSED
  - isPureArray          with SparseAfterHalfArray validity test FAILED
AVERAGE: 4.2895, min: 1.0184, max: 7.6607
----------------------

Running tests for Fixed_isPureArray...
  - Fixed_isPureArray    with MixedArray          : 0.9917
  - Fixed_isPureArray    with IntArray            : 6.5822
  - Fixed_isPureArray    with SparseIntArray      : 0.7397
  - Fixed_isPureArray    with StringArray         : 1.0283
  - Fixed_isPureArray    with MixedAfterHalfArray : 3.8250
  - Fixed_isPureArray    with SparseAfterHalfArray: 3.4929
 Running validity tests...
  - Fixed_isPureArray    with MixedArray           validity test PASSED
  - Fixed_isPureArray    with IntArray             validity test PASSED
  - Fixed_isPureArray    with SparseIntArray       validity test PASSED
  - Fixed_isPureArray    with StringArray          validity test PASSED
  - Fixed_isPureArray    with MixedAfterHalfArray  validity test PASSED
  - Fixed_isPureArray    with SparseAfterHalfArray validity test PASSED
AVERAGE: 2.7766, min: 0.7397, max: 6.5822
----------------------

Running tests for MT_isPureArray...
  - MT_isPureArray       with MixedArray          : 4.5757
  - MT_isPureArray       with IntArray            : 4.6140
  - MT_isPureArray       with SparseIntArray      : 4.5444
  - MT_isPureArray       with StringArray         : 5.5690
  - MT_isPureArray       with MixedAfterHalfArray : 5.0308
  - MT_isPureArray       with SparseAfterHalfArray: 2.8704
 Running validity tests...
  - MT_isPureArray       with MixedArray           validity test PASSED
  - MT_isPureArray       with IntArray             validity test PASSED
  - MT_isPureArray       with SparseIntArray       validity test PASSED
  - MT_isPureArray       with StringArray          validity test PASSED
  - MT_isPureArray       with MixedAfterHalfArray  validity test PASSED
  - MT_isPureArray       with SparseAfterHalfArray validity test PASSED
AVERAGE: 4.5341, min: 2.8704, max: 5.5690
----------------------

Running tests for For_isPureArray...
  - For_isPureArray      with MixedArray          : 2.2816
  - For_isPureArray      with IntArray            : 7.2355
  - For_isPureArray      with SparseIntArray      : 1.6775
  - For_isPureArray      with StringArray         : 2.5190
  - For_isPureArray      with MixedAfterHalfArray : 4.7890
  - For_isPureArray      with SparseAfterHalfArray: 3.8579
 Running validity tests...
  - For_isPureArray      with MixedArray           validity test PASSED
  - For_isPureArray      with IntArray             validity test PASSED
  - For_isPureArray      with SparseIntArray       validity test PASSED
  - For_isPureArray      with StringArray          validity test PASSED
  - For_isPureArray      with MixedAfterHalfArray  validity test PASSED
  - For_isPureArray      with SparseAfterHalfArray validity test PASSED
AVERAGE: 3.7267, min: 1.6775, max: 7.2355
----------------------

Running tests for Hybrid_isPureArray...
  - Hybrid_isPureArray   with MixedArray          : 0.9936
  - Hybrid_isPureArray   with IntArray            : 0.9236
  - Hybrid_isPureArray   with SparseIntArray      : 0.9323
  - Hybrid_isPureArray   with StringArray         : 1.0336
  - Hybrid_isPureArray   with MixedAfterHalfArray : 0.8976
  - Hybrid_isPureArray   with SparseAfterHalfArray: 0.6206
 Running validity tests...
  - Hybrid_isPureArray   with MixedArray           validity test PASSED
  - Hybrid_isPureArray   with IntArray             validity test FAILED
  - Hybrid_isPureArray   with SparseIntArray       validity test PASSED
  - Hybrid_isPureArray   with StringArray          validity test PASSED
  - Hybrid_isPureArray   with MixedAfterHalfArray  validity test PASSED
  - Hybrid_isPureArray   with SparseAfterHalfArray validity test PASSED
AVERAGE: 0.9002, min: 0.6206, max: 1.0336
----------------------

Original comment by msthorn...@gmail.com on 24 Sep 2009 at 3:02

Attachments:

GoogleCodeExporter commented 9 years ago
isPureArray actually had a bug in it that was pointed out by me in comment #19, 
but
unfortunately never got fixed when the patch was committed to SVN. I have 
committed a
fix for it now.

I will look over your benchmark, something seems strange with it though, as in 
my
tests isPureArray never took anywhere near 2 seconds to run.

Original comment by mike.ben...@gmail.com on 24 Sep 2009 at 3:36

GoogleCodeExporter commented 9 years ago
I'm running it 10000 times with 1000 elements. Only running once with
1000 elements doesn't offer much time to judge by.

Original comment by msthorn...@gmail.com on 24 Sep 2009 at 3:40

GoogleCodeExporter commented 9 years ago
I see comment #19 now. The or isn't actually needed since `if ( $k !== $i )` is
sufficient.

Original comment by msthorn...@gmail.com on 24 Sep 2009 at 3:41

GoogleCodeExporter commented 9 years ago
Your benchmark is a little confusing, it runs each test 10,000 times with a 1000
element array. I don't see the point in running each test that often, so I 
changed it
to run each test 10x with a 100,000 element array instead. 

Hybrid_isPureArray fails the IntArray test, and I'm not convinced that your
Fixed_isPureArray is fully bug proof either, I will do more testing...

Original comment by mike.ben...@gmail.com on 24 Sep 2009 at 3:46

GoogleCodeExporter commented 9 years ago
The point was to get enough variation to eliminate CPU/RAM usage from the test.
Running it more times than 10 provides better timing results. Try 1000 times 
with
10,000 elements.

Sorry, didn't notice the FAILED on Hybrid, changed the text to "FAIL" to make 
it more
clear and investigating.

Original comment by msthorn...@gmail.com on 24 Sep 2009 at 3:55

GoogleCodeExporter commented 9 years ago
I fixed the Hybrid test, was missing the ++$i when I pasted it in. Ran with 1000
tests and 10,000 elements, the Fixed_isPureArray runs best on average.

Running tests for isPureArray...
  - isPureArray          with MixedArray          : 1.5016
  - isPureArray          with IntArray            : 7.3355
  - isPureArray          with SparseIntArray      : 9.2348
  - isPureArray          with StringArray         : 2.0071
  - isPureArray          with MixedAfterHalfArray : 4.4503
  - isPureArray          with SparseAfterHalfArray: 5.5540
 Running validity tests...
  - isPureArray          with MixedArray           validity test PASSED
  - isPureArray          with IntArray             validity test PASSED
  - isPureArray          with SparseIntArray       validity test FAIL
  - isPureArray          with StringArray          validity test PASSED
  - isPureArray          with MixedAfterHalfArray  validity test PASSED
  - isPureArray          with SparseAfterHalfArray validity test FAIL
AVERAGE: 5.0139, min: 1.5016, max: 9.2348
----------------------

Running tests for Fixed_isPureArray...
  - Fixed_isPureArray    with MixedArray          : 1.5369
  - Fixed_isPureArray    with IntArray            : 7.5203
  - Fixed_isPureArray    with SparseIntArray      : 1.2974
  - Fixed_isPureArray    with StringArray         : 1.9026
  - Fixed_isPureArray    with MixedAfterHalfArray : 4.4443
  - Fixed_isPureArray    with SparseAfterHalfArray: 3.6789
 Running validity tests...
  - Fixed_isPureArray    with MixedArray           validity test PASSED
  - Fixed_isPureArray    with IntArray             validity test PASSED
  - Fixed_isPureArray    with SparseIntArray       validity test PASSED
  - Fixed_isPureArray    with StringArray          validity test PASSED
  - Fixed_isPureArray    with MixedAfterHalfArray  validity test PASSED
  - Fixed_isPureArray    with SparseAfterHalfArray validity test PASSED
AVERAGE: 3.3967, min: 1.2974, max: 7.5203
----------------------

Running tests for MT_isPureArray...
  - MT_isPureArray       with MixedArray          : 6.2092
  - MT_isPureArray       with IntArray            : 5.7315
  - MT_isPureArray       with SparseIntArray      : 5.3487
  - MT_isPureArray       with StringArray         : 6.7606
  - MT_isPureArray       with MixedAfterHalfArray : 6.3581
  - MT_isPureArray       with SparseAfterHalfArray: 3.6392
 Running validity tests...
  - MT_isPureArray       with MixedArray           validity test PASSED
  - MT_isPureArray       with IntArray             validity test PASSED
  - MT_isPureArray       with SparseIntArray       validity test PASSED
  - MT_isPureArray       with StringArray          validity test PASSED
  - MT_isPureArray       with MixedAfterHalfArray  validity test PASSED
  - MT_isPureArray       with SparseAfterHalfArray validity test PASSED
AVERAGE: 5.6745, min: 3.6392, max: 6.7606
----------------------

Running tests for For_isPureArray...
  - For_isPureArray      with MixedArray          : 2.9505
  - For_isPureArray      with IntArray            : 7.6639
  - For_isPureArray      with SparseIntArray      : 2.2478
  - For_isPureArray      with StringArray         : 3.3217
  - For_isPureArray      with MixedAfterHalfArray : 5.4210
  - For_isPureArray      with SparseAfterHalfArray: 4.3937
 Running validity tests...
  - For_isPureArray      with MixedArray           validity test PASSED
  - For_isPureArray      with IntArray             validity test PASSED
  - For_isPureArray      with SparseIntArray       validity test PASSED
  - For_isPureArray      with StringArray          validity test PASSED
  - For_isPureArray      with MixedAfterHalfArray  validity test PASSED
  - For_isPureArray      with SparseAfterHalfArray validity test PASSED
AVERAGE: 4.3331, min: 2.2478, max: 7.6639
----------------------

Running tests for Hybrid_isPureArray...
  - Hybrid_isPureArray   with MixedArray          : 1.4375
  - Hybrid_isPureArray   with IntArray            : 6.9443
  - Hybrid_isPureArray   with SparseIntArray      : 1.4479
  - Hybrid_isPureArray   with StringArray         : 1.8540
  - Hybrid_isPureArray   with MixedAfterHalfArray : 7.5628
  - Hybrid_isPureArray   with SparseAfterHalfArray: 4.5184
 Running validity tests...
  - Hybrid_isPureArray   with MixedArray           validity test PASSED
  - Hybrid_isPureArray   with IntArray             validity test PASSED
  - Hybrid_isPureArray   with SparseIntArray       validity test PASSED
  - Hybrid_isPureArray   with StringArray          validity test PASSED
  - Hybrid_isPureArray   with MixedAfterHalfArray  validity test PASSED
  - Hybrid_isPureArray   with SparseAfterHalfArray validity test PASSED
AVERAGE: 3.9608, min: 1.4375, max: 7.5628
----------------------

Original comment by msthorn...@gmail.com on 24 Sep 2009 at 4:05

Attachments:

GoogleCodeExporter commented 9 years ago
Also, with 100,000 records and strict memory limits (16MB) the test runs out of
memory. Trying to be polite with memory usage.

Original comment by msthorn...@gmail.com on 24 Sep 2009 at 4:09

GoogleCodeExporter commented 9 years ago
Running each test more times just helps if each iteration the times fluctuate 
wildly,
which they would if you test with a smaller array (ie: 10,000 elements vs 
100,000). 

But testing with a larger array will not only test the algorithms more for 
worst case
scenarios, but it will also smooth out those fluctuations. 

Testing with 500,000 element array 10x each, I get between 50 and 100ms 
fluctuation
on my average time between runs, so its pretty consistent.

Anyways, the Fixed_isPureArray will likely work, off the top of my head I can't 
come
up with a scenario where it fails, but I noticed the (int)$k !== $k check in 
Zend_AMF
or AMFPHP if I recall, so I figured it was worth keeping given the 5% 
performance
difference with such a huge array and all.

I think your MT_isPureArray() and Hybrid approach, while interesting, isn't 
going to
gain us anything, for the simple fact that MT_isPureArray() is slower than even
isPureArray() in every test I can throw at it. 

I think if you reduce the times each test is run, you will find there is almost 
more
overhead in calling the functions as the functions take to run themselves, again
especially if your test array is so small.

Original comment by mike.ben...@gmail.com on 24 Sep 2009 at 4:17

GoogleCodeExporter commented 9 years ago
Right, after running all the tests again (after fixing them), Fixed_isPureArray 
runs
best in most cases. The only advantage of MT_isPureArray is that with small 
arrays it
has a consistent time and usually a faster time on int arrays. However the 
average is
poor compared to Fixed_isPureArray.

The (int)$k !== $k isn't necessary because the $i !== $k checks the type to 
confirm
that they are both ints and the same value.

Original comment by msthorn...@gmail.com on 24 Sep 2009 at 4:49

GoogleCodeExporter commented 9 years ago
Okay, I committed the change to remove the unnecessary check in isPureArray(). 

Original comment by mike.ben...@gmail.com on 24 Sep 2009 at 7:10