leejw51 / protobuf-net

Automatically exported from code.google.com/p/protobuf-net
0 stars 0 forks source link

Enhancement suggestion for string tokenization #180

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I have a suggestion in two (related) parts:

1) All strings deserialized should be interned (but not using  string.Intern!) 
to reduce memory consumption. This can be done completely transparently.

2) For .Net to .Net, use string tokenization so that a string is only written 
into the stream once and tokens are used thereafter. This is of course outside 
the protobuf spec but should be fine for .Net to .Net use. It just requires 
that both ends need to agree on whether they are using tokenization or not.

I have tried this as a proof of concept using the Database/Orders data from 
Examples and the results are very good.

Without tokenization, the serialized size is 133,010 bytes.
With tokenization, this is reduced to 89,292 bytes and, as a bonus, is slightly 
faster too.

It works with the inplace compiler too and just requires setting s bool flag on 
RuntimeTypeModel.

Let me know if you are interested in pursuing this.

Original issue reported on code.google.com by simon.he...@simmotech.co.uk on 5 Jun 2011 at 2:13

GoogleCodeExporter commented 9 years ago
1: has been in the v1 code for a long time. If it isn't in v2 yet, it will be 
re-added soon. As noted, it uses a custom interner - not the system-wide one

2: is supported in v2 if you use AsReference=true on the member; however, it 
needs to be optional to support full backwards compatibility

Original comment by marc.gravell on 5 Jun 2011 at 6:30

GoogleCodeExporter commented 9 years ago
The AsReference doesn't work for strings as far as I can see. 

Original comment by simon.he...@simmotech.co.uk on 6 Jun 2011 at 7:29

GoogleCodeExporter commented 9 years ago
I'll check

Original comment by marc.gravell on 6 Jun 2011 at 9:55

GoogleCodeExporter commented 9 years ago
The interner had not been added back in. It is now. It also now works correctly 
with AsReference strings.

Original comment by marc.gravell on 13 Jun 2011 at 8:31