hiratake55 / RForcecom

RForcecom provides the connection to Force.com and Salesforce.com from R
http://cran.r-project.org/web/packages/RForcecom/index.html
Other
49 stars 25 forks source link

Duplicate Subscripts for columns #15

Open 3vivekb opened 9 years ago

3vivekb commented 9 years ago

(After successfully connecting) A query on ActivityHistory fails. soqlQuery <- "SELECT Name, (SELECT ActivityDate, Description FROM ActivityHistories) FROM Account" rforcecom.query(session, soqlQuery)

gives: Error in [<-.data.frame(*tmp*, i, names(xls[[i]]), value = c("NEXT Products", : duplicate subscripts for columns

I'm still new at this. Seems more like an R error.

StevenMMortimer commented 9 years ago

@3vivekb The function rforcecom.query is not capable of handling the XML response returned by this type of parent/child query. The result is too deeply nested. I'm not very good with parsing XML, but below you should find a working solution that calls some of the same code in rforcecom.query before doing custom parsing.

Also, when I started using this package I often wasn't sure whether my query was the problem or the R code, so I've been using the Salesforce Developer Workbench (https://workbench.developerforce.com) to run queries manually to first see if I've written them correctly before plugging them into R code.

library(RCurl)
library(XML)
library(plyr)

# create a session
session <- rforcecom.login(username, password, instanceURL, apiVersion)
# write your soql (start with pulling one account first)
soqlQuery <- "SELECT Id, Name, (SELECT ActivityDate, Description from ActivityHistories) FROM Account WHERE Id '{an account id goes here}'"

# prepare curl request (taken directly from RForcecom code)
h <- basicHeaderGatherer()
t <- basicTextGatherer()
endpointPath <- paste("services/data/v33.0/query/?q=", sep="")
URL <- paste(session["instanceURL"], endpointPath, curlEscape(soqlQuery), 
             sep = "")
OAuthString <- paste("Bearer", session["sessionID"])
httpHeader <- c(Authorization = OAuthString, Accept = "application/xml")
curlPerform(url = URL, httpheader = httpHeader, headerfunction = h$update, 
            writefunction = t$update, ssl.verifypeer = F)
if (exists("rforcecom.debug") && rforcecom.debug) {
  message(URL)
}
if (exists("rforcecom.debug") && rforcecom.debug) {
  message(t$value())
}
x.root <- xmlRoot(xmlTreeParse(t$value(), asText = T))
errorcode <- NA
errormessage <- NA
try(errorcode <- iconv(xmlValue(x.root[["Error"]][["errorCode"]]), 
                       from = "UTF-8", to = ""), TRUE)
try(errormessage <- iconv(xmlValue(x.root[["Error"]][["message"]]), 
                          from = "UTF-8", to = ""), TRUE)
if (!is.na(errorcode) && !is.na(errormessage)) {
  stop(paste(errorcode, errormessage, sep = ": "))
}

#NOW USE THIS SPECIAL PARSING SINCE THE STRUCTURE IS DEEPLY NESTED
#COMPARED TO OTHER QUERY XML RESPONSES

# parge into XML DOC
doc <- xmlTreeParse(t$value(), useInternalNodes = TRUE)
# pull out account nodes (we will loop through each)
account_nodes <- getNodeSet(doc, "//records[@type='Account']")
# pull out actvity history nodes (we will call separately for each Account)
activityhistory_nodes <- getNodeSet(doc, "//ActivityHistories")
all_dataframe <- NULL
for (i in 1:length(account_nodes)){
  one_account <- account_nodes[[i]]
  # take all children of the account node, except for the history, 
  # it will be done separately since it is its own list
  removeNodes(one_account[names(one_account) == "ActivityHistories"])
  # get top-level (account fields)
  this_top_level <- data.frame(matrix(unlist(xmlToList(one_account)), 
                                      ncol=length(unlist(xmlToList(one_account))), 
                                      byrow=T, dimnames=list(NULL, 
                                                             names(unlist(xmlToList(one_account))))))
  # get bottom-level (activity history fields)
  this_one_accounts_activity_history <- activityhistory_nodes[[i]]
  histories2 <- xmlToDataFrame(getNodeSet(this_one_accounts_activity_history, "records[@type='ActivityHistory']"))
  histories2$type.2 <- xpathSApply(this_one_accounts_activity_history, "records", xmlGetAttr, "type")
  histories2$url.2 <- xpathSApply(this_one_accounts_activity_history, "records", xmlGetAttr, "url")
  # add back account level fields
  histories2$type.1 <- this_top_level$.attrs.type
  histories2$url.1 <- this_top_level$.attrs.url
  all_dataframe <- rbind.fill(all_dataframe, histories2)
}
StevenMMortimer commented 9 years ago

Hopefully that's a start to figuring out a more compact and scalable way to parse these types of queries.

3vivekb commented 9 years ago

Thanks for the response @ReportMort ! I was just learning how to use SOQL queries when I wrote that. Now when I run queries I test them in SoqlXplorer - http://www.pocketsoap.com/osx/soqlx/ .

I tried to run the code you posted but I run into an error or two which i'll try to fix.

Maybe somebody in this forum knows how to flatten the original query, SELECT ActivityDate, Description FROM ActivityHistories) FROM Account? So instead of the nested dimension it ain't there?

StevenMMortimer commented 9 years ago

Could you just run 2 simple queries and use R to join the recordsets? SELECT Id, Name FROM Account use R to join SELECT Id, WhoId, WhatId, ActivityDate, Description FROM Task

3vivekb commented 9 years ago

OOO mygod. Task. Where have you been all my life.

StevenMMortimer commented 9 years ago

All Activities are in the Event object or the Task object