Closed kostya closed 9 years ago
Hi,
Thank for reporting this.
Can you provide the output from nokogiri -v
so that we know what your
environment looks like?
-m On May 10, 2015 11:56 AM, "kostya" notifications@github.com wrote:
1.6.6.2
require 'bundler/setup'require 'nokogiri' class Doc < Nokogiri::XML::SAX::Document def characters(chars) p chars endend
str = "<meta name = 'bla
body"parser = Nokogiri::HTML::SAX::Parser.new(Doc.new) parser.parse_memory(str)
output is empty.
but if i change meta to:
<meta name = 'bla'
output is "\nbody\n"
i think nokogiri also should fix first example;
— Reply to this email directly or view it on GitHub https://github.com/sparklemotion/nokogiri/issues/1286.
# Nokogiri (1.6.6.2)
---
warnings: []
nokogiri: 1.6.6.2
ruby:
version: 2.2.0
platform: x86_64-darwin13
description: ruby 2.2.0p0 (2014-12-25 revision 49005) [x86_64-darwin13]
engine: ruby
libxml:
binding: extension
source: packaged
libxml2_path: "/Users/kostya/.rbenv/versions/2.2.0/lib/ruby/gems/2.2.0/gems/nokogiri-1.6.6.2/ports/x86_64-apple-darwin13.0.0/libxml2/2.9.2"
libxslt_path: "/Users/kostya/.rbenv/versions/2.2.0/lib/ruby/gems/2.2.0/gems/nokogiri-1.6.6.2/ports/x86_64-apple-darwin13.0.0/libxslt/1.1.28"
libxml2_patches:
- 0001-Revert-Missing-initialization-for-the-catalog-module.patch
- 0002-Fix-missing-entities-after-CVE-2014-3660-fix.patch
libxslt_patches:
- 0001-Adding-doc-update-related-to-1.1.28.patch
- 0002-Fix-a-couple-of-places-where-f-printf-parameters-wer.patch
- 0003-Initialize-pseudo-random-number-generator-with-curre.patch
- 0004-EXSLT-function-str-replace-is-broken-as-is.patch
- 0006-Fix-str-padding-to-work-with-UTF-8-strings.patch
- 0007-Separate-function-for-predicate-matching-in-patterns.patch
- 0008-Fix-direct-pattern-matching.patch
- 0009-Fix-certain-patterns-with-predicates.patch
- 0010-Fix-handling-of-UTF-8-strings-in-EXSLT-crypto-module.patch
- 0013-Memory-leak-in-xsltCompileIdKeyPattern-error-path.patch
- 0014-Fix-for-bug-436589.patch
- 0015-Fix-mkdir-for-mingw.patch
compiled: 2.9.2
loaded: 2.9.2
Hi @kostya,
Apologies for not responding sooner. In the first example, libxml2 considers everything after the '
character to be an unclosed string, not an unclosed tag.
Nokogiri is limited in how broken markup is corrected by its underlying libraries (libxml2 for MRI or xerces for JRuby), and unfortunately there's nothing we can do without drastically invasive changes.
Sorry we can't help you in this situation.
1.6.6.2
output is empty.
but if i change meta to (add '):
nokogiri fixed broken html tags, and output is "\nbody\n"
i think nokogiri also should fix broken html tags in first example;