matthewmueller / x-ray

The next web scraper. See through the <html> noise.
MIT License
5.87k stars 349 forks source link

Filters are being called multiple times per item #247

Closed luishdez closed 6 years ago

luishdez commented 7 years ago

Filters are being called multiple times per filter and the extra calls the filter is getting an empty string.

Steps to reproduce

var Xray = require('x-ray');

var counter = 0;

var xray = Xray({
  filters: {
    trim: function (value) {
      console.log(counter++)
      return value.replace(/\n/g, '').trim();
    }
  }
});

xray(
  'https://dribbble.com/',
  '.dribbbles.group li',
  [{
    id: '@id',
    title: '.attribution-user a | trim'
  }]
)
((err, value) => {
  console.log(value);
})

Expected behaviour

There are 12 items in the page but the filter is called 47 times

Actual behaviour

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
[ { id: 'screenshot-3257041', title: 'Rahul' },
  { id: 'screenshot-3257243', title: 'Gleb Kuznetsov✈' },
  { id: 'screenshot-3257206', title: 'Ghani Pradita' },
  { id: 'screenshot-3257379', title: 'Marina Matijaca' },
  { id: 'screenshot-3257417', title: 'Ramotion' },
  { id: 'screenshot-3257268', title: 'Anton Fritsler (kit8)' },
  { id: 'screenshot-3256957', title: 'Alexander Laguta' },
  { id: 'screenshot-3256712', title: 'Yuri Kartashev' },
  { id: 'screenshot-3257229', title: 'lluck' },
  { id: 'screenshot-3257063', title: 'Bilal Ck' },
  { id: 'screenshot-3257128', title: 'Artiom Piatrykin' },
  { id: 'screenshot-3257061', title: 'Saepul Rohman' } ]
JafarAkhondali commented 6 years ago

@luishdez Nope, There are some nested li tags inside .dribbles.group. You have to specify only first child of ul like this:

'.dribbbles.group > li'

luishdez commented 6 years ago

I did solved it at the time. I forgot to update the issue. Anyway thanks. For sure it will help others. 👍