Closed markatosi closed 5 years ago
Hi @markatosi thanks for your suggestion, i will investigate this 👍
@totpero , any update on this matter? Should this suggestion be considered as a viable option for faster parsing?
Hi @markatosi I just push some changes:
Now you can use your own Regex implementation, I have created IRegexEngine
interface and by default if is not set is used MsRegexEngine
but you can replace this with your implementation;
I have implemented in different project PcreRegexEngine
;
You can use it like this:
var deviceDetector = new DeviceDetector(ua);
deviceDetector.SetRegexEngine(new PcreRegexEngine());
Or in every parser like this:
var botParser = new BotParser();
botParser.SetRegexEngine(new PcreRegexEngine());
With my PcreRegexEngine implementation not all tests pass; If i miss something or if you have something to add...fell free to do it.
Thanks
I replaced the Microsoft Regex calls with the library from https://github.com/ltrzesniewski/pcre-net
This resulted in a nearly 4x speed increase on my development machine with 1 thread and a 5.5x increase with 8 threads which seems to be on par with the php version of Device detector. This is important performance improvement if one needs to parse large numbers of user agents on a regular basis.
I'm not advocating that you do this in your project I'm just mentioning it in passing for anyone that requires faster performance.
I did not change any code other than replacing all MS regex calls with their equivalent Pcre calls. I performed this test on a 2019 iMac 3.6ghz core i9 using one thread. This particular source data file contains 3,576,720 unique agent strings.
Using standard regex
analyzing agents.... Lines Count: 3,576,720 thread: 0 1,000 of 3,576,720 done Thread: 0 in 00d:00h:00m:16s:583ms 2,000 of 3,576,720 done Thread: 0 in 00d:00h:00m:15s:772ms 3,000 of 3,576,720 done Thread: 0 in 00d:00h:00m:15s:506ms 4,000 of 3,576,720 done Thread: 0 in 00d:00h:00m:16s:081ms 5,000 of 3,576,720 done Thread: 0 in 00d:00h:00m:15s:897ms 6,000 of 3,576,720 done Thread: 0 in 00d:00h:00m:15s:416ms
analyzing agents.... Lines Count: 3,576,720 thread: 0 1,000 of 3,576,720 done Thread: 0 in 00d:00h:00m:04s:199ms 2,000 of 3,576,720 done Thread: 0 in 00d:00h:00m:04s:231ms 3,000 of 3,576,720 done Thread: 0 in 00d:00h:00m:04s:242ms 4,000 of 3,576,720 done Thread: 0 in 00d:00h:00m:04s:198ms 5,000 of 3,576,720 done Thread: 0 in 00d:00h:00m:04s:280ms 6,000 of 3,576,720 done Thread: 0 in 00d:00h:00m:04s:288ms
If I use 8 threads the MS regex version will process 8000 agents in 39 seconds If I use 8 threads with the Pcre version will process 8000 agents in 7 seconds
Your mileage may vary but I'm pretty darn happy with this improvement.