matthewmueller / x-ray

The next web scraper. See through the <html> noise.
MIT License
5.86k stars 350 forks source link

selecting items and handle them sequentially #349

Open moelfassi opened 5 years ago

moelfassi commented 5 years ago

Subject of the issue

this is my page: var html = "<div class='time_head'>time_head content1</div>" + "<div class='blockfix'>blockfix1</div>" + "<div class='blockfix'>blockfix2</div>" + "<div class='time_head'>time_head content2</div>" + "<div class='blockfix'>blockfix3</div>" + "<div class='blockfix'>blockfix4</div>" + "<div class='blockfix'>blockfix5</div>";

i need to get the results in that order like : TIME_HEAD CONTENT1 ----blockfix1 ----blockfix2 TIME_HEAD CONTENT2 ----blockfix3 ----blockfix4

this what i tried so far: x(html, { head: ['.time_head'], games: ['.blockfix']

})(function (err, obj) { console.log(obj['head']); console.log(obj['games']); });

Actual behaviour

but the result is:

[ 'time_head content1', 'time_head content2' ] [ 'blockfix1', 'blockfix2', 'blockfix3', 'blockfix4', 'blockfix5' ]

lathropd commented 5 years ago

Is the number of time_head divs consistent?

moelfassi commented 5 years ago

Is the number of time_head divs consistent?

No.. they are dates of events

lathropd commented 4 years ago

I think the solution is to capture them non-sequentially then sort them is a post-processing step.

SacDin commented 4 years ago

+1