wordpress-mobile / WordPress-Editor-iOS

⛔️ [DEPRECATED] A reusable iOS rich text editor component.
GNU General Public License v2.0
1.06k stars 210 forks source link

Create an HTML to NSAttributedString converter. #833

Closed diegoreymendez closed 8 years ago

diegoreymendez commented 8 years ago

Discovery task.

Create an HTML to NSAttributedString converter to see if it's feasible for us to use that for a native editor.

diegoreymendez commented 8 years ago

For my first attempt I'm using the initializer:

NSAttributedString(HTMLData:documentAttributes)

The problem with using this initializer is that for a string with no style information at all I'm already getting these attributes:

(lldb) po attributes
▿ 3 elements
  ▿ [0] : 2 elements
    - .0 : "CTForegroundColor"
  ▿ [1] : 2 elements
    - .0 : "NSFont"
  ▿ [2] : 2 elements
    - .0 : "NSParagraphStyle"

Which means we'll have extra tags when attempting to convert the NSAttributedString back to HTML. We can't use this mechanism unless we find a way around that limitation.

diegoreymendez commented 8 years ago

I'm trying out HTML/XML parsers to see which one is best for our converter. I've tried using NSXMLParser initially, since that's provided by Apple but it seems to fail quite easily with rather cryptic error messages such as:

Error Domain=NSXMLParserErrorDomain Code=111 "(null)"

in the case of this rather simple test HTML:

<HTML style='a' bold face='123'>Hello World!</HTML>

I'm going to be tring Fuzi next.

diegoreymendez commented 8 years ago

With the same test HTML string, Fuzi fails too, but in a different way. When the parser reads the HTML tag it only recognizes the style attribute (and ignores the custom bold and face attributes):

(lldb) po root
<HTML style="a"/>

This StackOverflow.com question has several answers pointing out that empty attributes are perfectly valid. On top of that, we need to make sure we maintain any custom additions by the user, so we can't afford data loss.

Fuzi is thus ruled out for now.

diegoreymendez commented 8 years ago

TBXML doesn't seem to be producing correct Swift headers, unfortunately.

diegoreymendez commented 8 years ago

libxml2's HTMLParser seems the way to go. It's a bit complicated since it's basically C methods called from Swift code, but it seems to be the easiest solution so far.

I'll be uploading some sample code tomorrow.

diegoreymendez commented 8 years ago

Closing. Continues here.