Hi! Thank you for this useful Scrapy spider!
Beforehand I would like to say that I'm a newbie in using Python.
I have an issue with foreign reviews (Russian, Chinese, Japanese etc.). In my output file (reviews.jl) these reviews display as \u0441\u043b\u0438\u0448\u043a\u043e\u043c etc. (after decoding it looks like this "слишком")
Is there any workaround for this issue - any chance of changing the script code so that the review text will export correctly without unicode escaped characters.
Right now I'm using Notepad++ plugin called HTMLPad to Decode JS. It works but it can't decode large amount of text at once (26000 reviews for example), so I have to select 100-200 strings and decode them manually which is real pain in the ass for 26000 reviews...
Hi! Thank you for this useful Scrapy spider! Beforehand I would like to say that I'm a newbie in using Python. I have an issue with foreign reviews (Russian, Chinese, Japanese etc.). In my output file (reviews.jl) these reviews display as \u0441\u043b\u0438\u0448\u043a\u043e\u043c etc. (after decoding it looks like this "слишком") Is there any workaround for this issue - any chance of changing the script code so that the review text will export correctly without unicode escaped characters.
Right now I'm using Notepad++ plugin called HTMLPad to Decode JS. It works but it can't decode large amount of text at once (26000 reviews for example), so I have to select 100-200 strings and decode them manually which is real pain in the ass for 26000 reviews...