tid-kijyun / Kanna

Kanna(鉋) is an XML/HTML parser for Swift.
MIT License
2.42k stars 221 forks source link

XPath and Childs, unknown behaviour. #236

Closed iDevPro closed 4 years ago

iDevPro commented 4 years ago

Description:

Installation method:

Kanna version (or commit hash):

5.2.2 with fix from 4.0.0 (name of module)

swift --version

Apple Swift version 5.2 (swiftlang-1103.0.32.1 clang-1103.0.32.29) Target: x86_64-apple-darwin19.4.0

Xcode version (optional):

Version 11.4 (11E146)

I found strange issue, when you need find a couple of same objects or object with child and want to iterate it. When you inside of loop use .xpath() for find items which contains what do you need, you cannot get it right because .xpath() return first subitem from root item.

for example:

// I try to find all books with this XPath, which return array(list, etc.) 
// of "brow-data" items:
static let userBookXPath = "//*[@id = 'booklist']//div[@class='brow-data']"

// This XPath for search book name
static let browBookNameXPath = "//a[contains(@class, 'brow-book-name')]"

// Next I want to iterate over it:
let books = try HTML(url: pageUrl, encoding: .utf8)
    .xpath(Constants.userBookXPath)
    .makeIterator()

while let book = books.next() {
    // parse even book here like that (this is example)
    // What am I doing wrong here ?
   print(book.xpath(Constants.browBookNameXPath).first?.content)

   // expected:
    (optional("Book title one "))
    (optional("Book title two "))
    (optional("Book title three "))

   // actual:
    (optional("Book title one "))
    (optional("Book title one "))
    (optional("Book title one "))

   // additional:
   // I have 20 book per page, and want iterate over 20 book,
   //  but for strange return here
   // I suppose book.xpath(Constants.browBookNameXPath).count should return 1
   // But expect 20 ))
}
tid-kijyun commented 4 years ago

To define a relative path you have to use dot-notation(.//).

static let browBookNameXPath = ".//a[contains(@class, 'brow-book-name')]"
iDevPro commented 4 years ago

I think close now, because I move to .css("pattern") and that work perfect :)