mwpenny / kijiji-scraper

A lightweight node.js module for retrieving and scraping ads from Kijiji
MIT License
96 stars 44 forks source link

Getting more results than expected #28

Closed mwolbaum closed 4 years ago

mwolbaum commented 4 years ago

Hi there,

I really like the module! I have been using it to collect data on motorcycles. I ran into a situation where one of my searches returned duplicate results. I looked further and the array size returned from search() is 68 ads but there is only 34 ads in the true results. Below are the parameters i'm using. Is this normal function? Or maybe i'm doing something weird. Thanks!

let options = { minResults: 40 }; let params = { locationId: 9003, categoryId: 30, sortByName: "priceAsc", keywords: 'GSXR750' };

mwpenny commented 4 years ago

Hi there, sorry I'm just getting to this now. Unfortunately I'm unable to reproduce your issue - is it still occurring for you?

mwpenny commented 4 years ago

After some more tinkering, I was able to reproduce your issue. Here's a rundown of the problem:

When the scraper searches for results, it needs to know to stop searching if there are fewer actual results than you asked for. The way this used to be implemented was to keep querying for more results pages until either the amount of results you asked for in options.minResults was reached, or an empty results page was returned.

It seems Kijiji changed what happens when a results page is asked for past the final page. They used to return an empty results page, but now they return the last valid page! So in your example, the real 34 results were retrieved, but the scraper was still 6 results short of 40. It then queried for another page - which did not exist - and Kijiji returned the same 34 results, causing you to get duplicates.

The end of the results pages is now detected properly. I was able to verify the fix on my machine using your example, but please try with the latest version and let me know if you still have problems.