When going from a soup to the "content" of the article, some elements of are omitted.
The issue seems to be with paragraphs that have links in them.
I isolated a case where it happens with VentureBeats (http://venturebeat.com/2015/11/06/activision-blizzards-strategy-for-world-conquest/), but I also noticed that it happens with other sources.
In the case of VentureBeats, the entire first paragraph doesn't show in the content crawled.
Fixed it. The bug was due to the removeComments() method in crawlContent removing any text that had a "if" substring (for instance any text with the word California)
When going from a soup to the "content" of the article, some elements of are omitted. The issue seems to be with paragraphs that have links in them. I isolated a case where it happens with VentureBeats (http://venturebeat.com/2015/11/06/activision-blizzards-strategy-for-world-conquest/), but I also noticed that it happens with other sources. In the case of VentureBeats, the entire first paragraph doesn't show in the content crawled.