zzzprojects / html-agility-pack

Html Agility Pack (HAP) is a free and open-source HTML parser written in C# to read/write DOM and supports plain XPATH or XSLT. It is a .NET code library that allows you to parse "out of the web" HTML files.
https://html-agility-pack.net
MIT License
2.63k stars 375 forks source link

Avoid creating new strings when parsing PcData #541

Closed 3bdNKocY closed 6 months ago

3bdNKocY commented 6 months ago

Avoid string concat and substring operations to optimize memory usage. This reduces memory usages by around 80MB on a local large html file.

image

JonathanMagnan commented 6 months ago

Hello @3bdNKocY ,

Thank you for your pull. we will make a slight change to it, but besides it, everything looks great.

Best Regards,

Jon

JonathanMagnan commented 6 months ago

Here is the small improvement we made with your modification: https://github.com/zzzprojects/html-agility-pack/commit/43d513377c788611db67c5d277458871e9589f3a

There is no point in starting the string.Compare logic if the tagStartMatching already return false.

I'm not sure how much it will improve the performance but it should certainly doesn't hurt it ;)

Best Regards,

Jon