tel8618217223380 / prado3

Automatically exported from code.google.com/p/prado3
Other
0 stars 0 forks source link

Prado serialization optimizations #337

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Prado is currently wasting a lot of space when serializing classes, for ex. 
when saving pagestate and caching output. This is because it saves all 
properties, even those still left unchanged and at their default values. These 
overly large serialization strings not only lead to faster exhaustion of memory 
cache space and worsen database cache performance, but also make PHP waste time 
at every deserialization run (which adds up to a lot if you have like a 1:100 
or 1:1000 cache hit ratio), by forcing unserialize() to process field-value 
pairs which in end do not change the actual value of any field, and are 
completely superflous. The issue effects classes with lot of mostly unused 
properties worst (like classes related to SqlMap and ActiveRecords), but can 
also lead for ex. to up to 40-100% larger pagestate strings. 

To solve the issue I've created this small patch to Prado, which

1. adds a __sleep() method to the TComponent base class so it can be safely 
overridden in any derived classes by calling parent::__sleep() (which wouldn't 
be possible if we wouldn't have that method in TComponent)

2. excludes unused fields still at their default values from serialization in 
the most heavily affected classes, thus saving a lot on the length of the 
serialized representation of objects based on those classes

There's also a false assumption in some of the Prado classes which already 
implement the __sleep() method, that when __sleep() is called the object is 
practically about to be destroyed, so they work like a pseudo-destructor. The 
most prominent case of that is TDbConnection which closes the connection to the 
database when it's getting serialized and thus has its __sleep() method called. 
This is wrong, as serialization does not only occour at the end of the object's 
life or at the end of page cycle, but can happen also mid-page or mid-life, 
when the object in question is still actively accessed afterwards. Therefore my 
patch

3. modifies these ill-behaving __sleep() methods so they don't actually release 
resources, while still ensuring that when the objects are waken up they will 
reinitialize, just as if they would have been "shut down" prior to sleep.

These few optimizations can reduce the average pagestate size and cache usage 
of Prado up to 20-40%, and also speed up page serving in general by a noticable 
factor, because of the fewer CPU cycles required to load cached classes/object 
structures, and because of eliminating the need to reallocate some resources 
(for ex. reopen the connection to the database) if the object will still be 
references/used even after being serialized (for ex. stored in some object 
cache).

The sources are based on the latest Prado trunk.

Original issue reported on code.google.com by google...@pcforum.hu on 25 Jun 2011 at 12:49

Attachments:

GoogleCodeExporter commented 9 years ago
Patch with fixes attached.

Original comment by google...@pcforum.hu on 25 Jun 2011 at 1:19

Attachments:

GoogleCodeExporter commented 9 years ago
Ive tried to give your patch a run, but results were not so big Some notes:
 - In TComponent i had to change $exprops[] = "\0TComponent\0_e", since it's a private member and not a protected one;
 - a var_export($this->_e) inside __sleep() call always returned an empty array in my tests.. so probably the gain from removing that element is not that much
 - i briefly tested the AR pacthes, they seems correct and working.

Original comment by ctrlal...@gmail.com on 25 Jun 2011 at 11:52

GoogleCodeExporter commented 9 years ago
TComponent-instance serialized without my patch:
O:10:"TComponent":1:{s:14:"TComponent_e";a:0:{}} - 50 characters

TComponent-instance serialized with my patch:
O:10:"TComponent":0:{} - 22 characters

So, that means that with my patch you save ~56% space on every TComponent-based 
object that has no event handlers attached. I wouldn't call 56% "not that 
much", as it's "rather much". Of course you don't save 56% on every component - 
but you do save 28 bytes on most of them, as most objects actually don't have 
actual handlers attached to their events (even if they define any events in the 
first place, which lot of TComponent-derived classes don't). If you save those 
28 bytes on like 1000 objects (which you can easily have in a page hierarchy 
with a a few repeaters) you already save 28 KBs of page state storage - and a 
*LOT* of CPU cycles when deserializing. 

Of course it all depends on what kind of objects you have in your hierarchy, 
but a lot of relatively tiny saving add up actually a lot in a complex 
hierarchy. The savings are most prominent with the SqlMap definitions as those 
use a *LOT* of TComponent-derived objects (for ex. TParameterPropertys, 
TResultPropertys, even though most of them don't actually have any of their 
properties changed from default prior to being stored in the sqlmap cache), but 
actually are present on almost any page in any scenario.

Original comment by google...@pcforum.hu on 25 Jun 2011 at 6:51

GoogleCodeExporter commented 9 years ago
Oh, btw

"a var_export($this->_e) inside __sleep() call always returned an empty array 
in my tests.. "
That's the very reason you save with my patch. Because most objects don't have 
any event handlers attached to them, have an empty _e array, and thus - when 
serialized -- they all waste at least those twenty-few in each of their 
instances, on every page, in every field-, map-, statement-defintion, etc. And 
as already explained, small savings actually add up to a lot.

Original comment by google...@pcforum.hu on 25 Jun 2011 at 7:04

GoogleCodeExporter commented 9 years ago
Ok, thank you for the further explaination. Committed as r3004

Original comment by ctrlal...@gmail.com on 26 Jun 2011 at 9:57

GoogleCodeExporter commented 9 years ago
Even more optimizations. Reduced my cached/serialized SqlMap definitions from 
100-120KBs to 60-80KBs, and that per table. 

Also the very fact that every possible local field is defined as "private" in 
Prado classes increases the size of the serialized object hierarchies 
enormously, as because of the former every single field saved will be prefixed 
with the full class name it's defined in. So for ex. in TSqlMapStatement 
definitions instead of just "_ID" we will have "\0TSqlMapStatement\0_ID" in the 
serialized form, for every instance of TSqlMapStatement, and for every field in 
those instances (includin all other field, which all will be prefixed with the 
"TSqlMapStatement" identifier). This is a complete waste of cache storage, I/O 
and CPU cycles. 

However, if all those "private" fields in Prado component classes would be 
declared as "protected" instead, that alone would save a lot. For ex. in the 
above case all references to the "ID" field would look like "\0*\0_ID", which 
is only 1/4th in length of the current (private) form. As similar savings 
(reaching from 10-40%) could be achieved for every single object saved, this 
change would reduce the footprint of serialized hierarchies considerably.

Original comment by google...@pcforum.hu on 13 Feb 2012 at 11:44

Attachments:

GoogleCodeExporter commented 9 years ago
Pushed the new patch as r3112. I kept behind the modification of 
Web/UI/TControl, since the new __sleep and __wakeup are commented out; was it 
intentional?

Original comment by ctrlal...@gmail.com on 14 Feb 2012 at 8:15

GoogleCodeExporter commented 9 years ago
Yes, because I did some benchmarking and there was no noticable gain from not 
saving the unused properties in the composite literals. This was because in 
those literals it's actual (textual) data that's making up most of the 
serialized stream, not the field name definitions, like with most classes. 
Therefore you can't save much in the serialized size, but will lose a lot of 
CPU cycles when deserializing, because of the extra checks needed in the 
__wakeup method. It just didn't seem to be worth the hassle in the end.

Original comment by google...@pcforum.hu on 14 Feb 2012 at 8:32