mannau / tm.plugin.webmining

Retrieve structured, textual data from various web sources.
34 stars 10 forks source link

Select by Date or Sort by date on GoogleNewsSource #4

Closed aruizga7 closed 9 years ago

aruizga7 commented 9 years ago

I am running the following code:

searchTerm = "Data Mining"
corpusGoog <- WebCorpus(GoogleNewsSource(params=list(hl = "en", q = searchTerm, ie = "utf-8", num = 10, output = "rss")))

headers<-meta(corpusGoog,tag="datetimestamp")

How can I get only news of a specific date or sort by date?

mannau commented 9 years ago

With the google news rss api, you can only get the most current news items (max 100 per request). However, you can filter the retrieved news items by e.g. date afterwards:

require(tm.plugin.webmining)
require(tm)
corpusGoog <- WebCorpus(GoogleNewsSource(params=list(hl = "en", q = searchTerm, ie = "utf-8", num = 10, output = "rss")))
# Filter corpus for news items greater than Feb 15th:
filter <- sapply(corpusGoog, function(x) meta(x, "datetimestamp") >= as.POSIXct("2015-02-15"))
corpusGoogFilter <- corpusGoog[filter]
# Sort corpus by datetimestamp
corpusorder <- order(sapply(corpusGoog, function(x) as.POSIXct(meta(x, "datetimestamp"))))
corpusGoogSort <- corpusGoog[corpusorder]