Open eashish93 opened 7 years ago
I'd like to know if this issue is going to be addressed. Thank you!
+1 I'm willing to give it a shot and try and patch this: @rchipka could you point me in the right direction as to where to look, what to change? Thanks!
@oliv23 I believe Osmosis sets libxmljs to use non-scrict error recovery mode already. This mode cannot recover from certain errors. If there's another libxml setting that we're missing, that would be the way to fix this.
I tried the following code for scrapping imdb, but it doesn't work due to malformed html response by imdb. I know it can be handled with
process_response
which accept callback functionfn(data)
, but for this case we need handle it with external dependency which is not good. So, please replace the strict xml mode to process malformed html automatically.And using other framework like
x-ray
, it does work.xray('http://www.imdb.com/title/tt0848228/', 'body')(console.log)