jsakamoto / ipaddressrange

.NET Class Library for range of IP address, both IPv4 and IPv6.
Mozilla Public License 2.0
370 stars 71 forks source link

Add static regexp defenition to avoid large memory usage #19

Closed llCorvinSll closed 9 years ago

llCorvinSll commented 9 years ago

In some cases (processing large amount of networks) using static Regex.Match gives large memory overhead and OutOfMemoryExceptions.

Putting patterns to static precompilet Regexp solve problem and reduce memory usage

jsakamoto commented 9 years ago

In some cases (processing large amount of networks) using static Regex.Match gives large memory overhead and OutOfMemoryExceptions.

Oh, really? It is hard to believe at first. What differences are there between static version Match method and instance version one?

Could you let me show any sample programs that cause OutOfMemoryExceptions, or, authority information resource?

llCorvinSll commented 9 years ago

ok. I'm play with VS profiling tools a bit. And get some results

oldApproach oldapproach

newApproach newapproach

test run creating new IpAddresRange in cycle (16383 string rerpesents uniq networks) 4 times

you can see what static Match method allocates more memory when regexp object and so on, i found one more problem: Byte operasions with Linq cause huge amount of copying and slow.

I can be wrong with interpreting profiling result, but modyfied object avoid OOM errors in my complex algorithm

jsakamoto commented 9 years ago

Thank you for your reply and detail report.
This reports are very interesting for me.

Ok, I'm going to merge your pull request, but I should work another task in this week.

Please give me one week.

jsakamoto commented 9 years ago

@llCorvinSll I'm sorry, what do you think about pull request#13?
https://github.com/jsakamoto/ipaddressrange/pull/13/files#diff-84e5184cbe9467997f4166539b02496eR114

That implementation use one shot Regex.Match call.

I'm lost merge this pull request#13. It is my misstake.

llCorvinSll commented 9 years ago

request#13 use very complex regex - this can slow down parsing process. And dont compile regex at first. Also, check #20 pull, which I'm avoid creating some of iterators in bit methods (some less memory and speed bump)

jsakamoto commented 9 years ago

(I started benchmarking...: https://github.com/jsakamoto/ipaddressrange-benchmark )

llCorvinSll commented 9 years ago

you can share you results?

jsakamoto commented 9 years ago

@llCorvinSll

you can share you results?

Yes. But .vsp files are to large (above 100MB), so I could not push to GitHub it.

I'm planing how to publish .vsp files. I'll upload it to OneDrive.
Please let me few hours or days.

llCorvinSll commented 9 years ago

Ok, you can make screenshots, instead of sharing memory dumps

llCorvinSll commented 9 years ago

memory overhead are simple: using static method returns new Regex object, inside constructor system try to find cached instance, but new object already created in memory

https://github.com/dotnet/corefx/blob/41e203011152581a6c65bb81ac44ec037140c1bb/src/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Regex.cs#L512

jsakamoto commented 9 years ago

I did benchmarks.

Screen shots are here: http://1drv.ms/1PsWt7z

Summary of benchmarks score is: http://1drv.ms/1PsW9G0

These benchmarks may tell us "static Regex members approach without LINQ bit operations" is lowest memory usage, about 19MB less than "single Regex pattern approach with LINQ bit operations".

Also, running speed best score is "static Regex members approach with LINQ bit operations", about 80msec less than "single Regex pattern approach with LINQ bit operations".

We should remember these benchmarks are stocking 20thousand IPAddressRange objects, and difference of memory usage is only few MB.
I feel it is over spec...

I'm still not sure about current IPAddressRange implementation causes OutOfMemoryException.

But anyway, I'll planing about merging of this pull request.

Buy the way, I feel "NO LINQ bit operations" implementation (pull request #20) are not beautiful, and it save only about 2MB at treating ten thousand order IPAddressRange instances, so I would not merge pull request #20.

What should I do?

jsakamoto commented 9 years ago

@llCorvinSll

using static method returns new Regex object, inside constructor system try to find cached instance, but new object already created in memory

Thank you for the information.

I understand: Regex.Match() static method implementation cache only RegexCode internal objects, but does not cache Regex objects.

llCorvinSll commented 9 years ago

@jsakamoto 2Mb is 12% difference! And and i dont big fan of creating iterators without reason. (bit operations) And, for my taste, linq operations on BITs hard to understand, im prefer to use arrays in this case (: If you dont want to use old way style - reject #20

Hovewer, regex gives huge improvement, and it's ok to be merged.

jsakamoto commented 9 years ago

@llCorvinSll

I merged this pull request #19. Thank you for your contribution :+1:

P.S. see also https://github.com/jsakamoto/ipaddressrange/pull/20#issuecomment-148708579