Closed supersam654 closed 8 years ago
I believe the content is in the "focusParagraph" span class, but the HTML is super messy
This is now fixed. (It was the same problem as issue #20 . To see if it works now, you can run recrawl.py and give it parameter 560f0fc2a6b867b094aa343d (that's the ID of the article). The newContent.txt contains the article, while oldContent.txt contains not the right stuff.
http://reuters.us.feedsportal.com/c/35217/f/654198/s/4a5d7276/sc/7/l/0L0Sreuters0N0Carticle0C20A150C10A0C0A20Cus0Eiran0Enuclear0Ekerry0Ezarif0EidUSKCN0ARW2JA20A1510A0A20DfeedType0FRSS0GfeedName0FworldNews/story01.htm