prnawa / boilerpipe

Automatically exported from code.google.com/p/boilerpipe
0 stars 0 forks source link

Can not parse NYtimes pages #55

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Boilerpipe can not parse NYTimes pages. I get no output when tried with NYTimes 
pages. 

Original issue reported on code.google.com by sasa...@gmail.com on 23 Sep 2012 at 5:38

GoogleCodeExporter commented 9 years ago
We're seeing the same problem. It's not all NYT pages, but some.

E.g. these don't work: 
http://theater.nytimes.com/2013/03/01/theater/reviews/sondheim-and-lapines-passi
on-at-classic-stage-company.html?ref=arts

http://theater.nytimes.com/2013/03/01/theater/reviews/the-revisionist-at-the-rat
tlestick-theater.html?ref=arts&_r=0

but this one works:
http://www.nytimes.com/2013/03/02/nyregion/us-judges-offer-addicts-a-way-to-avoi
d-prison.html?hp

Original comment by VidarBre...@gmail.com on 2 Mar 2013 at 12:54

GoogleCodeExporter commented 9 years ago
Any change on this issue? I am seeing the same thing with parsing NYT pages for 
my application. I think this might be related to the fact that NYT tries to set 
a cookie when a client makes a request. Would love to know any workarounds 
people have for this.

Original comment by kanari...@gmail.com on 10 Jun 2014 at 6:41