matthewmueller / x-ray

The next web scraper. See through the <html> noise.
MIT License
5.87k stars 349 forks source link

Scrap element based on data-attributes #255

Closed jzarca01 closed 7 years ago

jzarca01 commented 7 years ago

Subject of the issue

Hi, first of all thanks for your hard work, x-ray is incredibly easy to use However i'm having a hard time scraping one type of element

Here's the URL i'm trying to scrap : http://lookbook.nu/look/8651923-Kollaps-Noise-Music-Diy-Group-Sweat-Shoop-Pc

and here's my code

return x(baseLookUrl+lookId, '.look_main', {
        brands: x('#side_col .spotlight-user', [{
            brandImage: '.avatar@data-page-track~="brand" a img@src'
        }])

It cannot find the element based on the data-attribute that contains "brand" Is there something wrong ?

Thanks in advance, Jeremie.

alexchantastic commented 7 years ago

I haven't tried this, but I think your selector needs to look something like this:

.avatar[data-page-track~="brand"] a img@src
jzarca01 commented 7 years ago

Hi @alexchantastic, thanks for the answer. I've figured out how to do: Here's the answer for all the people who will encounter the same problem:

return x(baseLookUrl+lookId, '.look_main', {
        brands: x('#side_col .spotlight-user', '.avatar[data-page-track*=brand]', [{
            brandImage: 'a img@src',
        }])

As defined in the documentation: xray(html, scope, selector) Instead of a url, you can also supply raw HTML and all the same semantics apply.

var html = "<body><h2>Pear</h2></body>";
x(html, 'body', 'h2')(function(err, header) {
  header // => Pear
})