Open rkmax opened 8 years ago
Seems to be similar to this issue #65
@adriantomic I don't think so look the example
Looking at the logs it seems like x-ray doesn't wait for the inner responses
Got the same problem as @rkmax. As the docs indicate, a breadth-first crawling flow is recommended. So, you'd basically finish the root level of pages, then manually iterate over the second level and extend crawled data from the first level. Then proceed with the third. Solving this issue, really would confirm x-rays claim for being the next web scraper. So far great job, you're definitely on the right track!
it would be really great to have a example of something more complex then google image for the crawling to the next site.
just tried this with version version 2.0.2. and it is working if that helps anyone.
I have the same problem as @rkmax .
+1 should make it wait for inner response to fully resolve before continuing.
+1 it would be great to have this functionality working
+1 no point on this library without it. Thanks for the effort though but it's hard to use this library without this issue fixed ;(
@albertpeiro totally agree
Just reiterating that this isn't working on x-ray version 2.3.0, definitely very frustrating. Also, so everyone knows, looks like there's a PR in the works here: https://github.com/lapwinglabs/x-ray/pull/181
Yeah the PR #181 works
I think I'm still experiencing this issue except for arrays of collections. For example:
x('https://dribbble.com', 'li.group', [{
title: '.dribbble-img strong',
image: '.dribbble-img img@src',
url: '.dribbble-link@href',
tags: x('.dribbble-link@href', '.tag', [{
name: 'strong',
url: 'a@href'
}])
}])
(function(err, obj) {
console.log(obj);
});
When I run this I get something like:
[ { title: 'Wages App',
image: 'https://d13yacurqjgara.cloudfront.net/users/997070/screenshots/3404578/wages_teaser.gif',
url: 'https://dribbble.com/shots/3404578-Wages-App',
tags: [] },
{ title: 'Tokyo Gifathon Day 1',
image: 'https://d13yacurqjgara.cloudfront.net/users/566817/screenshots/3404873/tokyogifathon01_dribbble_teaser.gif',
url: 'https://dribbble.com/shots/3404873-Tokyo-Gifathon-Day-1',
tags: [] },
{ title: 'Designer_01',
image: 'https://d13yacurqjgara.cloudfront.net/users/1387536/screenshots/3404325/__-1_teaser.png',
url: 'https://dribbble.com/shots/3404325-Designer-01',
tags: [] },
{ title: 'Messenger - Redesign Sneak peek',
image: 'https://d13yacurqjgara.cloudfront.net/users/825808/screenshots/3404548/ui-sneak-peek_teaser.png',
url: 'https://dribbble.com/shots/3404548-Messenger-Redesign-Sneak-peek',
tags: [] },
{ title: 'Cube sliding of Room decoration assistant',
image: 'https://d13yacurqjgara.cloudfront.net/users/525747/screenshots/3404498/cube_teaser.gif',
url: 'https://dribbble.com/shots/3404498-Cube-sliding-of-Room-decoration-assistant',
tags: [] },
{ title: 'Poor Jelly Donut...',
image: 'https://d13yacurqjgara.cloudfront.net/users/1044993/screenshots/3404725/poor-donut_teaser.png',
url: 'https://dribbble.com/shots/3404725-Poor-Jelly-Donut',
tags: [] },
{ title: 'APP Data page design',
image: 'https://d13yacurqjgara.cloudfront.net/users/827126/screenshots/3404373/designbyzoeyshen_teaser.jpg',
url: 'https://dribbble.com/shots/3404373-APP-Data-page-design',
tags: [] },
{ title: 'Task Manager ',
image: 'https://d13yacurqjgara.cloudfront.net/users/257709/screenshots/3404733/task_manager_shot_3_50__teaser.png',
url: 'https://dribbble.com/shots/3404733-Task-Manager',
tags: [] },
{ title: 'Galaxy of goop',
image: 'https://d13yacurqjgara.cloudfront.net/users/671617/screenshots/3404102/dribbble-06_teaser.jpg',
url: 'https://dribbble.com/shots/3404102-Galaxy-of-goop',
tags: [] },
{ title: 'Sticker Mule Now',
image: 'https://d13yacurqjgara.cloudfront.net/users/24974/screenshots/3404796/sticker-mule-now_teaser.png',
url: 'https://dribbble.com/shots/3404796-Sticker-Mule-Now',
tags: [] },
{ title: 'LUV App',
image: 'https://d13yacurqjgara.cloudfront.net/users/311820/screenshots/3404736/cover_teaser.png',
url: 'https://dribbble.com/shots/3404736-LUV-App',
tags: [] },
{ title: 'Esc icon',
image: 'https://d13yacurqjgara.cloudfront.net/users/164417/screenshots/3404557/esc_teaser.jpg',
url: 'https://dribbble.com/shots/3404557-Esc-icon',
tags: [] }]
Note that the tags array is empty.
This seems to work:
x('https://dribbble.com', 'li.group', [{
title: '.dribbble-img strong',
image: '.dribbble-img img@src',
url: '.dribbble-link@href',
tags: x('.dribbble-link@href', ['.tag'])
}])
(function(err, obj) {
console.log(obj);
});
But I would like to produce a collection.
Hi guys I can see this issue still is open, any update? or plan to fix this issue and the related ones
@rkmax I ended up using CasperJS with PhantomJS. It's a bit more boilerplate code, but gave me more flexibility to do what I needed to do.
+1
+1, this issue really is annoying, forcing you to make callback functions for every call...
Still no fix?
This is the example from the documentation and works fine
Now I tried with the dribble example but fetching info from another site
but I'm getting
[ undefined,undefined,undefined]
in the results.json fileexecuting
DEBUG=x-ray node .