scrapinghub / splash

Lightweight, scriptable browser as a service with an HTTP API
BSD 3-Clause "New" or "Revised" License
4.08k stars 514 forks source link

Duplicate filter and js-pagination via POST-method #701

Open dmc-ua opened 6 years ago

dmc-ua commented 6 years ago

I tried to parse site with javascript pagination

http://www.russellcollins.co.uk/search.aspx?ListingType=5&areainformation=&areainformationname=Location&statusids=1&igid=&imgid=&egid=&emgid=&category=1&defaultlistingtype=5&markettype=0&cur=GBP

after click on pagination via splash:runjs(), page send post-request and return page with same url but with data for other page.

default filter SplashAwareDupeFilter drop such pages as duplicates because url is the same. how can I parse such sites?

Gallaecio commented 4 years ago

Did you manage to solve your issue? Did you try using dont_filter=True in your request?