mwaterfall / MWFeedParser

An Objective-C RSS / Atom Feed Parser for iOS
Other
2.28k stars 648 forks source link

Parser spends a lot of time in [NSString stringByRemovingNewLinesAndWhitespace] #50

Open Colourclash opened 12 years ago

Colourclash commented 12 years ago

I have noticed a performance bottleneck. When parsing a large RSS feed such as the itunes 300 new releases () it can spend 60% of the total parsing time in [NSString stringByRemovingNewLinesAndWhitespace].

The entire parsing operation (not including downloading of data from the server) can take 3900ms, and roughly 2200ms of that can be in stringByRemovingNewLinesAndWhitespace.

These figures where taken when testing on a release build on an iPhone 3GS using the MWFeedParser sample code and the following RSS feed:

http://ax.phobos.apple.com.edgesuite.net/WebObjects/MZStore.woa/wpa/MRSS/newreleases/limit=300/rss.xml

[Note: is this stripping of white space needed? Could it be made an optional feature to save CPU time?]

sylverb commented 12 years ago

Hello colourclash, maybe you could do a performance comparison with my fork https://github.com/sylverb/MWFeedParser which is using a different xml parser (based on libxml2 and GDataXMLNode). But I don't know if it's better or not in term of performances ...

Colourclash commented 12 years ago

@sylverb

Hi, I checked out your fork and I noticed you are not calling [NSString stringByRemovingNewLinesAndWhitespace] at all, so it's not really a fair comparison, as my issue is related to the time spent stripping whitespace.

Just out of interest though, your code takes 900ms to parse the same feed as I used as a benchmark on the original MWFeedParser code, so yes, your code is faster. The NSXMLParser based code takes 1600ms if I remove the white space stripping code. Does your code offer the same functionality as the original MWFeedParser then?

Regards

sylverb commented 12 years ago

Thanks for the test. It should provide the same functionalities (At least I've done my changes to keep the same interface and same results). The main reason for my fork is that NSXMLParseris very strict and will reject any not 100% xml compliant feed. As my goal was to be compatible with as much RSS feed as possible, it was important for me to make it less strict ... Performance was not my main goal at all, but it's great if it's faster ...

mwaterfall commented 12 years ago

The whitespace removal has 2 purposes, to trim whitespace around fields such as dates and times and other values that need processing (which is important and has to be done), and the other is to simply tidy things up with the pure text fields and remove extra spaces and new lines.

Thinking about it now, this could be optimised so that the more thorough whitespace cleanup only happens on the text fields, and other value fields (dates, links, etc) can have a faster routine that just trims whitespace from the beginning and end of the string. I will try and implement this optimisation when I next get around to updating the parser.

bluesuedesw commented 11 years ago

optimized this function by replacing it with this one line solution

return [self stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];