mannau / tm.plugin.webmining

Retrieve structured, textual data from various web sources.
34 stars 10 forks source link

I am still getting this issue in GoogleNewsSource #20

Open biranchi2018 opened 7 years ago

biranchi2018 commented 7 years ago

googlenews <- WebCorpus(GoogleNewsSource("Microsoft")) Unknown IO errorfailed to load external entity "http://news.google.com/news?hl=en&q=Microsoft&ie=utf-8&num=100&output=rss" Error: 1: Unknown IO error2: failed to load external entity "http://news.google.com/news?hl=en&q=Microsoft&ie=utf-8&num=100&output=rss"

mannau commented 7 years ago

Sorry - I can't reproduce the error. Which version are you using? Pls include the output of sessionInfo(). Best, m

dcadam commented 6 years ago

Hi Mannau,

I also have this error message running the same line of code above.

sessionInfo()

R version 3.4.3 (2017-11-30) Platform: x86_64-apple-darwin17.3.0 (64-bit) Running under: macOS High Sierra 10.13.3

Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /usr/local/Cellar/openblas/0.2.20_1/lib/libopenblasp-r0.2.20.dylib

locale: [1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] tm.plugin.webmining_1.3 tm_0.7-3 NLP_0.1-11

loaded via a namespace (and not attached): [1] compiler_3.4.3 parallel_3.4.3 tools_3.4.3 RCurl_1.95-4.10 yaml_2.1.16 Rcpp_0.12.15 slam_0.1-42
[8] RJSONIO_1.3-0 xml2_1.2.0 rJava_0.9-9 boilerpipeR_1.3 bitops_1.0-6 XML_3.98-1.9

brycesisu commented 6 years ago

I have the same error

> gsh <- WebCorpus(GoogleNewsSource("Trump")) Unknown IO errorfailed to load external entity "http://news.google.com/news?hl=en&q=Trump&ie=utf-8&num=100&output=rss" Error: 1: Unknown IO error2: failed to load external entity "http://news.google.com/news?hl=en&q=Trump&ie=utf-8&num=100&output=rss"

sessionInfo() `R version 3.4.4 (2018-03-15) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 16.04.4 LTS

Matrix products: default BLAS: /usr/lib/libblas/libblas.so.3.6.0 LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8 LC_ADDRESS=en_US.UTF-8 LC_TELEPHONE=en_US.UTF-8
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=en_US.UTF-8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] tm.plugin.webmining_1.3 tm_0.7-3 NLP_0.1-11

loaded via a namespace (and not attached): [1] compiler_3.4.4 parallel_3.4.4 tools_3.4.4 RCurl_1.95-4.10 yaml_2.1.16 Rcpp_0.12.16 slam_0.1-42 RJSONIO_1.3-0 xml2_1.1.1 rJava_0.9-9
[11] boilerpipeR_1.3 bitops_1.0-6 XML_3.98-1.9 `

roy-j commented 4 years ago

hi, @mannau

First of all, thank you for your time and this package! This error may be somehow related to SIP/301 from google in response to http request. I changed to https in 'R/source.R', rebuilt package and was able to get google response.

diff --git a/R/source.R b/R/source.R
index a4fd056..8daf9f3 100644
--- a/R/source.R
+++ b/R/source.R
@@ -173,7 +173,7 @@ GoogleNewsSource <- function(query, params =
                                                ie='utf-8', 
                                                num = 30, 
                                                output='rss'), ...){
-       feed <- "http://news.google.com/news"
+       feed <- "https://news.google.com/news"
        fq <- feedquery(feed, params)
        parser <- function(cr){
                tree <- parse(cr, type = "XML", asText = TRUE)

I'm not that specialist in R, I'm sure there is better approach.

Thanks!