matthewmueller / x-ray

The next web scraper. See through the <html> noise.
MIT License
5.88k stars 348 forks source link

How can i change the encoding? #183

Open castrors opened 8 years ago

castrors commented 8 years ago

How can i change the encoding?

I am having trouble with encoding

Your environment

My code is:


var Xray = require('x-ray');
var x = Xray({
  filters: {
    trim: function (value) {
      return typeof value === 'string' ? value.trim().replace(/[\n\t\r]/g,"") : value
    }
  }
});
x('http://ms.olx.com.br/imoveis', 'li.item', [{
  detail: 'p.detail-specific | trim',
  region: 'p.detail-region | trim'
}])
   .write('results.json');

Expected behaviour

{ "detail": "À venda | 360 m²", "region": "Três Lagoas, Jardim Progresso" }

Actual behaviour

{ "detail": "� venda | 360 m²", "region": "Tr�s Lagoas, Jardim Progresso" }

gnujeremie commented 8 years ago

Ran into the same problem. Did you find a solution ?

kulikalov commented 8 years ago

the same. The issue is present at leat 9 months. Here an example of a webpage x-ray can't handle properly: http://www.ozon.ru/context/detail/id/32161823/

perbyhring commented 8 years ago

Having the same problem

kulikalov commented 8 years ago

@perbyhring consider to use osmosis instead

luishdez commented 7 years ago

Yep same problem, I'll test osmosis thanks @anton-aleksandrov

vectart commented 7 years ago

I've plugged in Nightmare driver

thebarty commented 7 years ago

@vectart : did nightmare driver solve the problem for you? Right now it is giving me problems to install (dependency issues). Is it worth it?

dyatko commented 7 years ago

@thebarty yes