matthewmueller / x-ray

The next web scraper. See through the <html> noise.
MIT License
5.87k stars 349 forks source link

selecting collections doesn't work in a way specified in docs #273

Closed marzelin closed 5 years ago

marzelin commented 6 years ago

Subject of the issue

Retrieving collections doesn't work as described in docs

Your environment

reproduction here: https://runkit.com/marzelin/x-ray-bug

given this example:

const x = require("x-ray")()
const html = `
<body>
    <ul>
      <li>1-1</li>
      <li>1-2</li>
      <li>1-3</li>
    </ul>
    <ul>
      <li>2-1</li>
      <li>2-2</li>
      <li>2-3</li>
    </ul>
    <ul>
      <li>3-1</li>
      <li>3-2</li>
      <li>3-3</li>
    </ul>
</body>`

x(html, "body", x("ul", ["li"]))((err, result) => {
    console.log(result)
})

per docs x('ul', ['li']) should select all items in the first list but it selects all items in all lists. Also, per docs x(['ul'], ['li']) should return all items in all lists, but it returns nothing.

zamrq commented 6 years ago

yeach selecting collections doesn't work in a way specified in docs

jinnabaalu commented 6 years ago

I just replicated above requirement it is working fine able to extract exactly as explained.

image

this issue can be closed

marzelin commented 6 years ago

@JinnaBalu The selector: x('ul', ['li']) should only return items from the first list (["1-1", "1-2", "1-3"]). The result you've got is all items from all lists. The selector that should return all items from all lists is x(['ul'], ['li']).

willisplummer commented 6 years ago

I ran into this issue too 😬

lathropd commented 5 years ago

The behavior you want would require you to explicitly to use :nth-of-type(1). By using the array is going to give you all matches.