tid-kijyun / Kanna

Kanna(鉋) is an XML/HTML parser for Swift.
MIT License
2.42k stars 221 forks source link

Chained xpaths are searching from root level #237

Open anivaros opened 4 years ago

anivaros commented 4 years ago

Description:

All node xpaths are calling for root document level, not for node. For example, this test will fail.

func testInnerXpath() {
    let input = """
                <html>
                <head>
                    <title>test title</title>
                </head>
                <body>
                    <div id="1"><div><h1>test header 1</h1></div></div>
                    <div id="2"><div><h1>test header 2</h1></div></div>
                </body>
                </html>
                """
    do {
        let doc = try HTML(html: input, encoding: .utf8)
        //all this asserts will fail:
        XCTAssertNil(doc.at_xpath("//head")?.at_xpath("//h1")?.toHTML)
        XCTAssertNil(doc.at_xpath("//head")?.at_xpath("//body")?.toHTML)
        XCTAssertNil(doc.at_xpath("//body")?.at_xpath("//title")?.toHTML)
        XCTAssertEqual(doc.at_xpath("//body/div[@id='2']")?.at_xpath("//h1")?.text, "test header 2")
        //only this assert is ok, passes:
        XCTAssertEqual(doc.at_xpath("//body/div[@id='2']//h1")?.text, "test header 2")
    } catch {
        XCTFail("Abnormal test data")
    }
}

Is it bug or feature?

I've started implementing fix of this problem I'm casting xmlNodePtr to xmlDocPtr and initing xmlXPathNewContext with this casted object and then all xpaths starting work properly.

tid-kijyun commented 4 years ago

To define a relative path you have to use dot-notation(.//).

XCTAssertNil(doc.at_xpath("//head")?.at_xpath(".//h1")?.toHTML)
XCTAssertNil(doc.at_xpath("//head")?.at_xpath(".//body")?.toHTML)
XCTAssertNil(doc.at_xpath("//body")?.at_xpath(".//title")?.toHTML)
XCTAssertEqual(doc.at_xpath("//body/div[@id='2']")?.at_xpath(".//h1")?.text, "test header 2")

//only this assert is ok, passes:
XCTAssertEqual(doc.at_xpath("//body/div[@id='2']//h1")?.text, "test header 2")

I understand this is confusing. I plan to change it so that it works even if dot notation is omitted in a future release.