hiratake55 / RForcecom

RForcecom provides the connection to Force.com and Salesforce.com from R
http://cran.r-project.org/web/packages/RForcecom/index.html
Other
49 stars 25 forks source link

rforcecom.query SOQL rbind.fill issue #23

Open ghost opened 8 years ago

ghost commented 8 years ago

A query that was working last week is now producing an error of:

Error: All inputs to rbind.fill must be data.frames

It seems to throw this error when queriing over 2000 values, calling the rforcecom.queryMore, failing on the rbind.fill piece.

if (!is.na(nextRecordsUrl)) {
    nextRecords <- rforcecom.queryMore(session, nextRecordsUrl)
    xdf.iconv <- rbind.fill(xdf.iconv, nextRecords)
}

It looks like xdf.iconv is not a data frame inside of rforcecom.queryMore. Any reason why this would have just started to happen?

hiratake55 commented 8 years ago

@mtwright88 If dataset contains too many missing values, rbind.fill will fail. But today, I released RForcecom 0.8 and already available on CRAN which has bulk operation feature. Please try this if you possible.

abeburnett commented 8 years ago

you could also try:

xdf.iconv <- rbindlist(xdf.iconv, nextRecords, fill = TRUE)

The fill = TRUE argument in combination with rbindlist should be much more efficient than rbind.fill as it is written entirely in C.

ghost commented 8 years ago

@hiratake55 I don't think it will work for the bulk query option due to some of the restrictions that bulk queries have. I am using Relationship fields.

It works when I use the dpyr function bind_rows or when I roll back the package to 0.7 though.

@abeburnett I believe there is also rbind.all which is built in C as well. I think these are all depreciated versions of the new dplyr function bind_rows, not sure which one is the fastest though.

hiratake55 commented 8 years ago

@abeburnett @mtwright88 Thank you for your advice. I'll check rbindlist and rbind.all.

StevenMMortimer commented 8 years ago

@mtwright88 Would you mind sharing your SOQL and possibly a snippet of the returned XML from the curl command (de-identified if needed)? Salesforce returns different XML response formats depending on how the SOQL is written, so hard to identify root cause of your issue without more detail.

Also, I'm trying to gather more examples to test a different parsing strategy to handle things like nested queries and foreign object relationships. That function is available here: https://github.com/ReportMort/RForcecom/blob/query-refactor/R/rforcecom.utils.R and you're free to install from my branch to test yourself.

ghost commented 8 years ago

@ReportMort Sure thing.

Here is the SOQL query: "SELECT Contactr.Unitr.Name, LNamec, Contactr.Email, Contactr.Typec, Contactr.Empc FROM ContLc WHERE Conr.Statusc = 'xxxx' AND (NOT LangNc IN('xxxx')) AND Contactr.Typec IN ('xxx', 'xxx', 'xxx', 'xxx', 'xxx', 'xxx') AND (NOT Contactr.ITypec='xxxx') AND (NOT Lang__c='xxx')"

And here is one of the 5000+ records that are returned. I had to put a space after < because it wasn't showing up in the message otherwise.

< records type="ContLc" url="xxxxxxxxxxxxxxxxxxxxxxxxx"> < Contactr type="Contact" url="xxxxxxxxxxx"> < Unitr type="Unitc" url="xxxxxxxxxxx"> < Name>xxxxxx < /Unitr> < Email>xxxxxxx < Typec>xxxxxxxx < Empc>xxxxxxx</Empc> < /Contactr> < LangNc>xxxxxxxx < /records>

Like I said the query.more will bind using other methods or the .7 version with rbind.fill of Rforcecom.

Thanks a lot for looking at this :).

jmeb commented 8 years ago

I've got the same failure going on. The query returns successfully when I impose a row return limit, but fails when pulling the full query (40K lines).

Using bulkQuery isn't a work around because of the limitations on those types of queries (traversing relationships etc.)

For now the only workaround I've found is regressing to the 0.7 package which doesn't appear to have the same failure.

kng229 commented 8 years ago

Also having this issue with version 0.8, had to revert to 0.7 to get it to work. The fill=TRUE did not help; when I copied the rforcecom.query code into a user-defined function I got an error: function forcecom.api.getSoqlEndpoint could not be found.

jnawras-zz commented 8 years ago

I had same issue, which I didn't two months ago. I tried all suggested solutions here including reverting to v0.7, not working for me. PS: I'm running windows

FALSE, It worked, I just needed to restart R, install v0.7. I'll keep the comment above as is, so if anyone else like me didn't figure it out.

StevenMMortimer commented 8 years ago

@kng229 When you copied the function code and received the error rforcecom.api.getSoqlEndpoint could not be found this is because all of the functions in the rforcecom.api.R file are not exported via the NAMESPACE, so they won't be found after you load the package in R. You just need to run "Source" on that rforcecom.api.R file, then all of the functions will be found as you expect and you can continue testing the function yourself.

StevenMMortimer commented 8 years ago

@jnawras I wrote my own function that works for some more complex query structures (example queries listed below that work). I opened a pull request (#31), so that hopefully the author starts to consider using a new approach.

Here is how to install my version of RForcecom with different query parser is you are interested:

library(devtools)
install_github('ReportMort/RForcecom', ref='query-refactor')

Working Query Structures:

options(stringsAsFactors=FALSE)
# Pulls Accounts and 2 fields from the ActivityHistory object
columnTest2 <- rforcecom.query(session, soqlQuery = "SELECT Id, Name, CreatedDate, (SELECT ActivityDate, Description FROM ActivityHistories) FROM Account where CreatedDate > 2015-09-01T00:00:00.000Z")

# Pulls Opportunities with one field from the Owner and Account objects
columnTest3 <- rforcecom.query(session, soqlQuery = "SELECT Id, Amount, Name, Account.Name, Owner.Name FROM Opportunity WHERE Opportunity.CreatedDate >= 2015-09-01T00:00:00.000Z")

# Pulls Opportunities with multiple fields from the Owner object and one field from the  Account object
columnTest4 <- rforcecom.query(session, soqlQuery = "SELECT Id, Amount, Name, Account.Name, Owner.Name, Owner.Alias FROM Opportunity WHERE Opportunity.CreatedDate >= 2015-09-01T00:00:00.000Z")
jnawras-zz commented 8 years ago

@ReportMort This is very helpful, thanks Steve!