matthewmueller / x-ray

The next web scraper. See through the <html> noise.
MIT License
5.88k stars 349 forks source link

Internationalized pages being scraped with wrong locale. #207

Open kulikalov opened 8 years ago

kulikalov commented 8 years ago

Subject of the issue

E.g. i want to scrape a book page from play market. Google market is available in multiple languages without url-specific markers. So, how can i scrape the specific version with language i need instead of random one?

Your environment

"x-ray": "^2.2.0" node --version: 5.9.0 npm --version: 3.7.3

Steps to reproduce

Try to scrape any page from play market. This one for example: https://play.google.com/store/books/details/Walter_Isaacson_Steve_Jobs?id=I6R8MXStPXgC

Expected behaviour

I need to be able to specify locale i want to get

Actual behaviour

It scrapes random language (ukranian, spanish, portuges - all this while i'm running x-ray from the same VPS)

kulikalov commented 8 years ago

Any guesses how to fix that?

kulikalov commented 8 years ago

The solution is to allow users to specify headers props for requests. If it would be possible to pass headers from x-ray to > x-ray-crawler > superagent, then it would solve the problem