mwaterfall / MWFeedParser

An Objective-C RSS / Atom Feed Parser for iOS
Other
2.28k stars 648 forks source link

stringByEncodingXMLEntities returns wrong values #30

Closed cliffjoyce closed 13 years ago

cliffjoyce commented 13 years ago

I realize that "NSString+XMLEntities" is depreciated (replaced by "NSString+HTML").

However, "NSString+HTML" stringByEncodingHTMLEntities is not suitable for encoding entities for use in XML. The reason is that XML apparently only permits these 5 keyword mapped entities:

{ @""", 34 },
{ @"&", 38 },
{ @"'", 39 },
{ @"<", 60 },
{ @">", 62 },

For XML, all other entities must apparently be mapped to the appropriate &#xxx; value. Example character: Ø. The &Oslash needs to instead be &#216 or other XML parsers barf.

Do you know of a quick workaround? Thanks!

cliffjoyce commented 13 years ago

Update: I actually found a quick workaround solution. I checked out a fresh copy of the Google Toolbox from this page:

http://code.google.com/p/google-toolbox-for-mac/

Then I added this code to your NSString+HTML class:

#import "GTMNSString+XML.h"

// Encode all XML entities using GTM
- (NSString *)stringByEncodingXMLEntities {
    return [NSString stringWithString:[self gtm_stringBySanitizingAndEscapingForXML]];
}

Also, it turns out that I was mistaken about one thing: only the 5 entities above need to be escaped. Other unicode chars evidently don't need any escaping at all. The nice thing about the Google gtm_stringBySanitizingAndEscapingForXML call is that it will also remove any illegal chars.

mwaterfall commented 13 years ago

Glad you sorted it out! :-)