sgsinclair / Voyant

GNU General Public License v3.0
208 stars 53 forks source link

RSS Feed Error in Spyral Notebooks #512

Closed kaylinland closed 3 years ago

kaylinland commented 3 years ago

Spyral does not recognize URL with RSS (https://www.cbc.ca/cmlink/rss-topstories) with loadCorpus function. The error reads: An error occurred during multi-threaded document expansion.

ajmacdonald commented 3 years ago

I'm having trouble reproducing this: https://voyant-tools.org/spyral/rss-test/ Could it be that the CBC's RSS feed was down at the time?

kaylinland commented 3 years ago

No, I think the issue was with using loadCorpus without the summary method. It is working now--thanks!

kaylinland commented 3 years ago

I'm still having issues making this function work:

loadCorpus("http://www.cbc.ca/cmlink/rss-topstories", {
    inputFormat: 'xml', // force XML (not RSS)
    xmlContentXpath: "//item/description" // grab item description for content
});

Maybe there is an issue with the XML path, but it looks okay to me.

ajmacdonald commented 3 years ago

I just had a quick look, but it seems to work if you don't specify inputFormat (not currently sure why).

loadCorpus('https://www.cbc.ca/cmlink/rss-topstories', {
    xmlContentXpath: "//item/description"
}).summary()

gives me: This corpus (86f8e166b6650ff7c869f484c76f4a17) has 20 documents with 1,870 total words and 711 unique word forms.

kaylinland commented 3 years ago

That's interesting! I got it to work with the addition of the .summary() option--you need the inputFormat because the idea is to turn it from a corpus with 20 separate documents to one single document.


loadCorpus('https://www.cbc.ca/cmlink/rss-topstories', {
    inputFormat: 'xml', //force xml
    xmlContentXpath: "//item/description" //define xpath 
}).summary()

This does the job!

ajmacdonald commented 3 years ago

Ok I misunderstood what you wanted to do (re: single document). You shouldn't have to call summary() in order for loadCorpus to work by the way. Please have a look at the newly edited version of https://voyant-tools.org/spyral/rss-test/