Closed chitemerere closed 6 years ago
I am trying to scrap a table with multiple pages from the web with R using the following code:
library(XML) library(RCurl) library(plyr) curlVersion()$features curlVersion()$protocol
fetchAllData <- function(page) { temp <- paste0("https://www.zauba.com/export-trimethoprim/fp-zimbabwe/p-", page, "-hs-code.html") data <- readHTMLTable(temp, stringsAsFactors = FALSE) data <- readHTMLTable(temp) frMW <- data.frame(data) }
fetchAll <- ldply(1:4, fetchAllData, .progress="text")
View(fetchAll)
i get the following error message:
Error in list_to_dataframe(res, attr(.data, "split_labels"), .id, id_as_factor) : Results must be all atomic, or all data frames In addition: Warning messages: 1: XML content does not seem to be XML: 'https://www.zauba.com/export-trimethoprim/fp-zimbabwe/p-1-hs-code.html' 2: XML content does not seem to be XML: 'https://www.zauba.com/export-trimethoprim/fp-zimbabwe/p-1-hs-code.html' 3: XML content does not seem to be XML: 'https://www.zauba.com/export-trimethoprim/fp-zimbabwe/p-2-hs-code.html' 4: XML content does not seem to be XML: 'https://www.zauba.com/export-trimethoprim/fp-zimbabwe/p-2-hs-code.html' 5: XML content does not seem to be XML: 'https://www.zauba.com/export-trimethoprim/fp-zimbabwe/p-3-hs-code.html' 6: XML content does not seem to be XML: 'https://www.zauba.com/export-trimethoprim/fp-zimbabwe/p-3-hs-code.html' 7: XML content does not seem to be XML: 'https://www.zauba.com/export-trimethoprim/fp-zimbabwe/p-4-hs-code.html' 8: XML content does not seem to be XML: 'https://www.zauba.com/export-trimethoprim/fp-zimbabwe/p-4-hs-code.html'
Please assist
Regards
not related to archivist you may be interested in the harvest package
I am trying to scrap a table with multiple pages from the web with R using the following code:
library(XML) library(RCurl) library(plyr) curlVersion()$features curlVersion()$protocol
fetchAllData <- function(page) { temp <- paste0("https://www.zauba.com/export-trimethoprim/fp-zimbabwe/p-", page, "-hs-code.html") data <- readHTMLTable(temp, stringsAsFactors = FALSE) data <- readHTMLTable(temp) frMW <- data.frame(data) }
fetchAll <- ldply(1:4, fetchAllData, .progress="text")
View(fetchAll)
i get the following error message:
Error in list_to_dataframe(res, attr(.data, "split_labels"), .id, id_as_factor) : Results must be all atomic, or all data frames In addition: Warning messages: 1: XML content does not seem to be XML: 'https://www.zauba.com/export-trimethoprim/fp-zimbabwe/p-1-hs-code.html' 2: XML content does not seem to be XML: 'https://www.zauba.com/export-trimethoprim/fp-zimbabwe/p-1-hs-code.html' 3: XML content does not seem to be XML: 'https://www.zauba.com/export-trimethoprim/fp-zimbabwe/p-2-hs-code.html' 4: XML content does not seem to be XML: 'https://www.zauba.com/export-trimethoprim/fp-zimbabwe/p-2-hs-code.html' 5: XML content does not seem to be XML: 'https://www.zauba.com/export-trimethoprim/fp-zimbabwe/p-3-hs-code.html' 6: XML content does not seem to be XML: 'https://www.zauba.com/export-trimethoprim/fp-zimbabwe/p-3-hs-code.html' 7: XML content does not seem to be XML: 'https://www.zauba.com/export-trimethoprim/fp-zimbabwe/p-4-hs-code.html' 8: XML content does not seem to be XML: 'https://www.zauba.com/export-trimethoprim/fp-zimbabwe/p-4-hs-code.html'
Please assist
Regards