Closed robocop-bob closed 9 years ago
Should be fixed now, see https://github.com/mannau/tm.plugin.webmining/commit/22f00f90fd476dc7c2a9850b3fb0285fd5603c17 best, m
Hi,
NYTimesSource working now much better... Thx. You are testing this function with n=200. Try a little bigger number.
I am trying :
NY_list <- c("Ibm", "Google", "Oracle","HP","Microsoft","Apple")
res.corp_nyt <- lapply(NY_list, function(x) NYTimesSource(x,n = 900, count = 10 ,appid = "xyzzzzzzzz" ))
res.corp_nyt_list <- lapply(1:length(res.corp_nyt), function(x) { print(x) ; WebCorpus(res.corp_nyt[[x]]) } )
At the time when WebCorpus is executed R loosing connection with RStudio and need restart. It is not stable when R is loosing connection. Sometimes at 1,2 of list . This example shows error at 4 position of list res.corp_nyt.
[1] 1
[1] 2
[1] 3
<simpleError in curlMultiPerform(multiHandle): embedded nul in string: '%PDF-1.5\r%\xe2\xe3\xcf\xd3\r\n9974 0 obj\r<</Linearized 1/L 981366/O 9976/E 60354/N 12/T 979997/H [ 485 346]>>\rendobj\r \r\n9985 0 obj\r<</DecodeParms<</Columns 4/Predictor 12>>/Filter/FlateDecode/ID[<6E56DCDB3A45A04AB65D8B8B1BE3720C><37DDDE7A909ED04CAB9D0B8CAF53D0F6>]/Index[9974 22]/Info 9973 0 R/Length 69/Prev 979998/Root 9975 0 R/Size 9996/Type/XRef/W[1 2 1]>>stream\r\nh\xdebbd\020``b`\xd6\a\022\x8cw@D\033\x90`\xd9\t"^\0\t\xaeh\x90\x98\n\x90\xe0\xd8\004$d\xba\x80D\x82\030\x90HId`bTk\005\xb2\030\030\030\x89!\xfe\xdf~\xf3\003 \xc0\0$U\v\024\r\nendstream\rendobj\rstartxref\r\n0\r\n%%EOF\r\n \r\n9995 0 obj\r<</C 289/Filter/FlateDecode/I 312/Length 250/O 273/S 208>>stream\r\nh\xdeb```f``:\xc8\xc0\xce\xc0 \xea\xc7 ̀\0\xc2\f,\f\xac@\xcc\xc1\xc3\xc0\xf0\xa13\x83\x91\x89\x85A@\xca[A\xa9\x81\xb1K(`\xe2\017\xb1\xe6\v\xef\xdf\xdc0\x90`vL,;\xc2\xc0\xc0\022\xa5`\x96[[\x96\x96\xc6\xc0 \xb9RY!mIO\xe3\xacI\xcf\034R\xa2o\v\x9c\xf5\x94T\x91\\\xa9\x97\xc2>
Error on retrieval, single retrieval fallback...
<simpleError in curlPerform(curl = curl, .opts = opts, .encoding = .encoding): embedded nul in string: '%PDF-1.5\r%\xe2\xe3\xcf\xd3\r\n9974 0 obj\r<</Linearized 1/L 981366/O 9976/E 60354/N 12/T 979997/H [ 485 346]>>\rendobj\r \r\n9985 0 obj\r<</DecodeParms<</Columns 4/Predictor 12>>/Filter/FlateDecode/ID[<6E56DCDB3A45A04AB65D8B8B1BE3720C><37DDDE7A909ED04CAB9D0B8CAF53D0F6>]/Index[9974 22]/Info 9973 0 R/Length 69/Prev 979998/Root 9975 0 R/Size 9996/Type/XRef/W[1 2 1]>>stream\r\nh\xdebbd\020``b`\xd6\a\022\x8cw@D\033\x90`\xd9\t"^\0\t\xaeh\x90\x98\n\x90\xe0\xd8\004$d\xba\x80D\x82\030\x90HId`bTk\005\xb2\030\030\030\x89!\xfe\xdf~\xf3\003 \xc0\0$U\v\024\r\nendstream\rendobj\rstartxref\r\n0\r\n%%EOF\r\n \r\n9995 0 obj\r<</C 289/Filter/FlateDecode/I 312/Length 250/O 273/S 208>>stream\r\nh\xdeb```f``:\xc8\xc0\xce\xc0 \xea\xc7 ̀\0\xc2\f,\f\xac@\xcc\xc1\xc3\xc0\xf0\xa13\x83\x91\x89\x85A@\xca[A\xa9\x81\xb1K(`\xe2\017\xb1\xe6\v\xef\xdf\xdc0\x90`vL,;\xc2\xc0\xc0\022\xa5`\x96[[\x96\x96\xc6\xc0 \xb9RY!mIO\xe3\xacI\xcf\034R\xa2o\v\x9c\xf5\x94T\x91\\\xa9\x97\xc2>
[1] 4
Error: Unable to establish connection with R session
Error: Unable to establish connection with R session
Any help ?
Best regards robert
Hi Mario,
I found small issue with NYTimesSource .
NY API defines output as pages with 10 documents per page.
"Integer, 0–last set of ten default: 0
The value of page corresponds to a set of 10 results (it does not indicate the starting number of the result set). For example, page=0 corresponds to records 0-9. To return records 10-19, set page to 1, not 10"
NYTimesSource funcion as defaul has: page = seq(0, n - count, by = count) what give us 0 10 20 30 40 50 60 70 80 90. So, it returns 10 documents from page 0 ,10 ,20, etc but should retur 10 documents from page 0,1,2,3,4, etc... As maximum page number NY defines 100 what in result should give us 1000 documents but limiting Call per Secound up to 10 per second and 10.000 calls per day.
Because WebSource as default has defined curlOptions maxconnects = 20 and we use fq vector >1 getURL use async = TRUE... it makes error because NYT retuns bad format ( errors related to limits).
Can you help and correct it in source ?
best regards Robert