hiratake55 / RForcecom

RForcecom provides the connection to Force.com and Salesforce.com from R
http://cran.r-project.org/web/packages/RForcecom/index.html
Other
49 stars 25 forks source link

Feature Request: Bulk insert / update of records #32

Open 3vivekb opened 8 years ago

3vivekb commented 8 years ago

So right now within the R Code we can update one record at a time. But if we run a for loop and attempt to update many records, we can run into the 1000-15000 request limit. Also it's kinda slow.

Can somebody implement the bulk insert / update?

StevenMMortimer commented 8 years ago

The Bulk API should have been included in the package starting at v0.8.0. If you type ?RForcecom and review the package index samples, you should see the following examples of how to use the Bulk API:

## BULK INSERT
 # create a sample data.frame of 1000 records
 n <- 1000
 data <- data.frame(Name=paste('New Record:', 1:n),
                    stringsAsFactors=FALSE)

 # run an insert job into the Account object
 job_info <- rforcecom.createBulkJob(session,
                                     operation='insert',
                                     object='Account')

 # split into batch sizes of 500 (2 batches for our 1000 row sample dataset)
 batches_info <- rforcecom.createBulkBatch(session,
                                           jobId=job_info$id,
                                           data,
                                           multiBatch = TRUE,
                                           batchSize=500)

 # check on status of each batch
 batches_status <- lapply(batches_info,
                          FUN=function(x){
                           rforcecom.checkBatchStatus(session,
                                                      jobId=x$jobId,
                                                      batchId=x$id)
                                                      })
 # get details on each batch
 batches_detail <- lapply(batches_info,
                          FUN=function(x){
                           rforcecom.getBatchDetails(session,
                                                     jobId=x$jobId,
                                                     batchId=x$id)
                                                     })
 # close the job
 close_job_info <- rforcecom.closeBulkJob(session, jobId=job_info$id)

 ## BULK DELETE THE PRIOR INSERT

 # format the data
 batch_details_together <- plyr::ldply(batches_detail)
 delete_ids <- data.frame(id=batch_details_together[,"Id"],
                          stringsAsFactors=FALSE)

 job_info <- rforcecom.createBulkJob(session, operation='delete', object='Account')
 batches_info <- rforcecom.createBulkBatch(session,
                                           jobId=job_info$id,
                                           data=delete_ids)
 # check on status of each batch
 batches_status <- lapply(batches_info,
                          FUN=function(x){
                           rforcecom.checkBatchStatus(session,
                                                      jobId=x$jobId,
                                                      batchId=x$id)
                                                      })
 # get details on each batch
 batches_detail <- lapply(batches_info,
                          FUN=function(x){
                           rforcecom.getBatchDetails(session,
                                                     jobId=x$jobId,
                                                     batchId=x$id)
                                                     })
 # close the job
 close_job_info <- rforcecom.closeBulkJob(session, jobId=job_info$id)

 ## BULK QUERY

 query <- "SELECT Id, Name FROM Account LIMIT 10"
 job_info <- rforcecom.createBulkJob(session, operation='query', object='Account')
 batch_query_info <- rforcecom.submitBulkQuery(session,
                                               jobId=job_info$id,
                                               query=query)

 batch_query_status <- rforcecom.checkBatchStatus(session,
                                                  jobId=batch_query_info$jobId,
                                                  batchId=batch_query_info$id)

 batch_query_details <- rforcecom.getBatchDetails(session,
                                                  jobId=batch_query_info$jobId,
                                                  batchId=batch_query_info$id)

 batch_query_recordset <- rforcecom.getBulkQueryResult(session,
                                                       jobId=batch_query_info$jobId,
                                                       batchId=batch_query_info$id,
                                                       resultId=batch_query_details$result)
 close_job_info <- rforcecom.closeBulkJob(session, jobId=job_info$id)

 ## BULK INSERT ATTACHMENTS

 # prepare your .zip file and request.txt manifest before calling these functions
 file <- 'request.zip'
 job_info <- rforcecom.createBulkJob(session, operation='insert', object='Attachment')
 batch_attachment_info <- rforcecom.insertBulkAttachments(session,
                                                          jobId=job_info$id,
                                                          file=file)
 batch_attachment_status <- rforcecom.checkBatchStatus(session,
                                                       jobId=batch_attachment_info$jobId,
                                                       batchId=batch_attachment_info$id)
 batch_attachment_details <- rforcecom.getBatchDetails(session,
                                                       jobId=batch_attachment_info$jobId,
                                                       batchId=batch_attachment_info$id)
 # close the job
 close_job_info <- rforcecom.closeBulkJob(session, jobId=job_info$id)

## End(Not run)
ilcca commented 8 years ago

How external ID should be set when creating bulk update/upsert? I.e. upset requires externalID to be set, but doesn't allow to set it:

job_info <- rforcecom.createBulkJob(session, operation="upsert", object="Contact") Error in rforcecom.createBulkJob(session, operation = "upsert", object = "Contact") : InvalidJob: External ID was blank for Contact. An External ID must be specified for upsert. job_info <- rforcecom.createBulkJob(session, operation="upsert", externalIdFieldName = "Member_IDc", object="Contact") Error in rforcecom.createBulkJob(session, operation = "upsert", externalIdFieldName = "Member_ID__c", : unused argument (externalIdFieldName = "Member_IDc")

StevenMMortimer commented 8 years ago

@ilcca Thanks for bringing up the issue. I've fixed and submitted a PR here: https://github.com/hiratake55/RForcecom/pull/35

The feature hasn't been merged yet, so if you really want to use it then you'll need to re-install RForcecom with the following command in R: devtools::install_github('ReportMort/RForcecom', ref='metadata-api')

Here is an example of how to use:

job_info <- rforcecom.createBulkJob(session,  operation='upsert', 
                                    externalIdFieldName='My_External_Id__c', 
                                    object='Account')

my_data <- data.frame(My_External_Id__c=c('11111','22222', '99999'),  
                      Name=c('Updated_Name1', 'Updated_Name2', 'Upserted_Record'), 
                      stringsAsFactors=FALSE)

batches_info <- rforcecom.createBulkBatch(session, 
                                          jobId=job_info$id, 
                                          data=my_data)

# gather the batches result as you would with other operations (insert/update/delete)
ilcca commented 8 years ago

Steve, thanks for fixing this so quick! I got your package and the upsert works well now. However, to use update should there be external id applied as well? And what is the key id now when updating?

StevenMMortimer commented 8 years ago

Updating does not require an external id. Simply include a column entitled "Id" in your data, set the bulk job to "update" and Salesforce will update the corresponding records based on that Id field

ilcca commented 8 years ago

So if I want to update rows at Contact, is the "Id" here the same one retrieved with SQL: "Select Id from Contact"? Hence to update rows I must first get the list of Ids to update. Am I right?

My idea was to use an External id to update rows at Contact.

StevenMMortimer commented 8 years ago

Yes. "Id" here means the unique ID generated by Salesforce that corresponds to the record.

If you want to update based on External Id, then this might be a good strategy:

  1. Run SOQL: "Select Id, My_External_Id__c from Contact"
  2. In R, join your update data with this SOQL recordset on "My_External_Id__c", so that the "Id" column gets transferred into your update data.
  3. Ensure no duplicates got created during your join.
  4. Select the columns you want to send over to Salesforce (making sure you include the "Id" column)
  5. Create update bulk job and send over the data.frame.
ilcca commented 8 years ago

Thanks for the strategy, it makes sense.

However, when operating with bigger data then fetching all Ids from a Salesforce table can require lots of memory within R and traffic between Salesforce and R as well. That's why I'm wondering if there is a chance to make the sql query to include only needed Ids i.e. "Select Id, My_External_Idc from Contact WHERE My_External_Idc in LIST" Here LIST means a list of External Id to be updated.

Btibert3 commented 7 years ago

@ReportMort Looking over this thread, is it possible to do two things: A) update fields of information on a record and simultaneously B)attach a text file to that record?

StevenMMortimer commented 7 years ago

@Btibert3 You'll have to do it in 2 passes, you cannot do it simultaneously because bulk jobs can only target one object at a time. Attachments must go to the Attachment object with a specially formatted zip file to upload and I think you're hinting that the records you want to update are in a different object (e.g. Account), so you'll have to create a second bulk job to do that.

Btibert3 commented 7 years ago

@ReportMort Interesting, thanks for the quick reply. I am not as well versed in the API, but yes, we have a custom object that I want to add both facts to as well as attach the file. Because the record on the custom object would already exist prior to updating the fields and adding the file, does that matter. The part I am somewhat lost on is how I associate the attachment with the record of interest.

StevenMMortimer commented 7 years ago

You'll need to create a zip file containing your attachments and a manifest file formatted as CSV that associates each document inside your zip file to a specific record id in Salesforce. You can see the example R code at the bottom of the package help, just type ?R`Forcecom-package` in your R console.

Here is the example code copy/pasted from there:

 ## BULK INSERT ATTACHMENTS

 # prepare your .zip file and request.txt manifest before calling these functions
 file <- 'request.zip'
 job_info <- rforcecom.createBulkJob(session, operation='insert', object='Attachment')
 batch_attachment_info <- rforcecom.insertBulkAttachments(session, 
                                                          jobId=job_info$id, 
                                                          file=file)
 batch_attachment_status <- rforcecom.checkBatchStatus(session, 
                                                       jobId=batch_attachment_info$jobId, 
                                                       batchId=batch_attachment_info$id)
 batch_attachment_details <- rforcecom.getBatchDetails(session, 
                                                       jobId=batch_attachment_info$jobId, 
                                                       batchId=batch_attachment_info$id)
 # close the job
 close_job_info <- rforcecom.closeBulkJob(session, jobId=job_info$id)

Here is the reference page on how to create your zip file of attachments with the appropriate manifest: https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/binary_intro.htm. Your zip file is kept locally and you should create a CSV request.txt file since that is the default format used by the RForcecom package when inserting into the Attachment object.

Btibert3 commented 7 years ago

You are the man, thanks for your help! Ok, seeing that, how am I am able to assign that attachment to a specific record on my custom object?

StevenMMortimer commented 7 years ago

Your manifest.txt file will contain the Ids. The Ids are always unique, even across objects, so don't worry, the attachments will end up going to the right records on your custom object as long as you've specified their individual Ids properly.

I suggest, going through and updating all the records first, that way you know the Ids and can put those in your manifest.txt file.

Btibert3 commented 7 years ago

Ahhhh! Let me take a look. Much appreciated.