Closed GoogleCodeExporter closed 9 years ago
See issue #24 for one HtmlAgilityPack bug biting Fizzler.
Original comment by azizatif
on 6 May 2009 at 3:22
Two possibles:
http://developer.mindtouch.com/SgmlReader
http://code.google.com/p/twintsam/
Original comment by info%colinramsay.co.uk@gtempaccount.com
on 6 May 2009 at 3:27
> twintsam
The project home page says, "The code is not usable yet." That leaves just
SgmlReader
for now.
Original comment by azizatif
on 6 May 2009 at 3:34
I think dropping HtmlAgilityPack (at least as the default) is a
good idea. It isn't actively maintained and its developers don't
seem too eager to fix bugs in it either. It is an excellent library
for simple HTML parsing, and is afaik the only one exposing a full
DOM (which is very convenient), but because of its bugs and
inactivity, I think it's a wise plan to move away from it.
SgmlReader and Twintsam are both alternatives worth looking into. I know
Thomas Broyer, the project owner of Twintsam, and it is a very promising
project with the goal of being the reference implementation of the HTML5
parsing algorithm in C#. That's a noble goal, imho.
SgmlReader, on the other hand, is a nice, but old and a bit dated
implementation. I believe, though, that SgmlReader is the best of
the three at the moment, but the code quality of the project is in
my humble opinion not too great, which is why I don't consider
contributing to it. I also don't think there's much testing to
speak of in the SgmlReader project, although it is being actively
maintained and bugs are fixed.
I would love to cooperate in implementing either of these (or others, if
there are any) alternatives. For the long term, I think Twintsam might be
the best project to bet on, but it does indeed need some work before it's
production ready, so it might be something worth investigating for version
2.0 of Fizzler.
Original comment by asbjornu
on 6 May 2009 at 7:11
@asbjornu: That for your feedback on the various alternatives.
> It isn't actively maintained and its developers don't
> seem too eager to fix bugs in it either.
Wonder if it's time to fork?
> would love to cooperate in implementing either of these
Great! I've changed the summary of this issue so now it points to specifically
to
SgmlReader and you can initially submit your contribution as a patch. If you
need
assistance with understanding any bits of Fizzler, let us know!
We can open another issue for Twintsam when it makes sense.
Original comment by azizatif
on 8 May 2009 at 12:13
New Fizzler.Systems.XmlNodeQuery in r193 will support use of SgmlReader. All
tests
pass, including an extra one to test "form input" CSS selector which was the
root
reason for starting this issue.
Original comment by info%colinramsay.co.uk@gtempaccount.com
on 11 May 2009 at 11:46
Original issue reported on code.google.com by
azizatif
on 6 May 2009 at 3:22