Open 3vivekb opened 8 years ago
The Bulk API should have been included in the package starting at v0.8.0. If you type ?RForcecom
and review the package index samples, you should see the following examples of how to use the Bulk API:
## BULK INSERT
# create a sample data.frame of 1000 records
n <- 1000
data <- data.frame(Name=paste('New Record:', 1:n),
stringsAsFactors=FALSE)
# run an insert job into the Account object
job_info <- rforcecom.createBulkJob(session,
operation='insert',
object='Account')
# split into batch sizes of 500 (2 batches for our 1000 row sample dataset)
batches_info <- rforcecom.createBulkBatch(session,
jobId=job_info$id,
data,
multiBatch = TRUE,
batchSize=500)
# check on status of each batch
batches_status <- lapply(batches_info,
FUN=function(x){
rforcecom.checkBatchStatus(session,
jobId=x$jobId,
batchId=x$id)
})
# get details on each batch
batches_detail <- lapply(batches_info,
FUN=function(x){
rforcecom.getBatchDetails(session,
jobId=x$jobId,
batchId=x$id)
})
# close the job
close_job_info <- rforcecom.closeBulkJob(session, jobId=job_info$id)
## BULK DELETE THE PRIOR INSERT
# format the data
batch_details_together <- plyr::ldply(batches_detail)
delete_ids <- data.frame(id=batch_details_together[,"Id"],
stringsAsFactors=FALSE)
job_info <- rforcecom.createBulkJob(session, operation='delete', object='Account')
batches_info <- rforcecom.createBulkBatch(session,
jobId=job_info$id,
data=delete_ids)
# check on status of each batch
batches_status <- lapply(batches_info,
FUN=function(x){
rforcecom.checkBatchStatus(session,
jobId=x$jobId,
batchId=x$id)
})
# get details on each batch
batches_detail <- lapply(batches_info,
FUN=function(x){
rforcecom.getBatchDetails(session,
jobId=x$jobId,
batchId=x$id)
})
# close the job
close_job_info <- rforcecom.closeBulkJob(session, jobId=job_info$id)
## BULK QUERY
query <- "SELECT Id, Name FROM Account LIMIT 10"
job_info <- rforcecom.createBulkJob(session, operation='query', object='Account')
batch_query_info <- rforcecom.submitBulkQuery(session,
jobId=job_info$id,
query=query)
batch_query_status <- rforcecom.checkBatchStatus(session,
jobId=batch_query_info$jobId,
batchId=batch_query_info$id)
batch_query_details <- rforcecom.getBatchDetails(session,
jobId=batch_query_info$jobId,
batchId=batch_query_info$id)
batch_query_recordset <- rforcecom.getBulkQueryResult(session,
jobId=batch_query_info$jobId,
batchId=batch_query_info$id,
resultId=batch_query_details$result)
close_job_info <- rforcecom.closeBulkJob(session, jobId=job_info$id)
## BULK INSERT ATTACHMENTS
# prepare your .zip file and request.txt manifest before calling these functions
file <- 'request.zip'
job_info <- rforcecom.createBulkJob(session, operation='insert', object='Attachment')
batch_attachment_info <- rforcecom.insertBulkAttachments(session,
jobId=job_info$id,
file=file)
batch_attachment_status <- rforcecom.checkBatchStatus(session,
jobId=batch_attachment_info$jobId,
batchId=batch_attachment_info$id)
batch_attachment_details <- rforcecom.getBatchDetails(session,
jobId=batch_attachment_info$jobId,
batchId=batch_attachment_info$id)
# close the job
close_job_info <- rforcecom.closeBulkJob(session, jobId=job_info$id)
## End(Not run)
How external ID should be set when creating bulk update/upsert? I.e. upset requires externalID to be set, but doesn't allow to set it:
job_info <- rforcecom.createBulkJob(session, operation="upsert", object="Contact") Error in rforcecom.createBulkJob(session, operation = "upsert", object = "Contact") : InvalidJob: External ID was blank for Contact. An External ID must be specified for upsert. job_info <- rforcecom.createBulkJob(session, operation="upsert", externalIdFieldName = "Member_IDc", object="Contact") Error in rforcecom.createBulkJob(session, operation = "upsert", externalIdFieldName = "Member_ID__c", : unused argument (externalIdFieldName = "Member_IDc")
@ilcca Thanks for bringing up the issue. I've fixed and submitted a PR here: https://github.com/hiratake55/RForcecom/pull/35
The feature hasn't been merged yet, so if you really want to use it then you'll need to re-install RForcecom with the following command in R:
devtools::install_github('ReportMort/RForcecom', ref='metadata-api')
Here is an example of how to use:
job_info <- rforcecom.createBulkJob(session, operation='upsert',
externalIdFieldName='My_External_Id__c',
object='Account')
my_data <- data.frame(My_External_Id__c=c('11111','22222', '99999'),
Name=c('Updated_Name1', 'Updated_Name2', 'Upserted_Record'),
stringsAsFactors=FALSE)
batches_info <- rforcecom.createBulkBatch(session,
jobId=job_info$id,
data=my_data)
# gather the batches result as you would with other operations (insert/update/delete)
Steve, thanks for fixing this so quick! I got your package and the upsert works well now. However, to use update should there be external id applied as well? And what is the key id now when updating?
Updating does not require an external id. Simply include a column entitled "Id" in your data, set the bulk job to "update" and Salesforce will update the corresponding records based on that Id field
So if I want to update rows at Contact, is the "Id" here the same one retrieved with SQL: "Select Id from Contact"? Hence to update rows I must first get the list of Ids to update. Am I right?
My idea was to use an External id to update rows at Contact.
Yes. "Id" here means the unique ID generated by Salesforce that corresponds to the record.
If you want to update based on External Id, then this might be a good strategy:
Thanks for the strategy, it makes sense.
However, when operating with bigger data then fetching all Ids from a Salesforce table can require lots of memory within R and traffic between Salesforce and R as well. That's why I'm wondering if there is a chance to make the sql query to include only needed Ids i.e. "Select Id, My_External_Idc from Contact WHERE My_External_Idc in LIST" Here LIST means a list of External Id to be updated.
@ReportMort Looking over this thread, is it possible to do two things: A) update fields of information on a record and simultaneously B)attach a text file to that record?
@Btibert3 You'll have to do it in 2 passes, you cannot do it simultaneously because bulk jobs can only target one object at a time. Attachments must go to the Attachment object with a specially formatted zip file to upload and I think you're hinting that the records you want to update are in a different object (e.g. Account), so you'll have to create a second bulk job to do that.
@ReportMort Interesting, thanks for the quick reply. I am not as well versed in the API, but yes, we have a custom object that I want to add both facts to as well as attach the file. Because the record on the custom object would already exist prior to updating the fields and adding the file, does that matter. The part I am somewhat lost on is how I associate the attachment with the record of interest.
You'll need to create a zip file containing your attachments and a manifest file formatted as CSV that associates each document inside your zip file to a specific record id in Salesforce. You can see the example R code at the bottom of the package help, just type ?R`Forcecom-package`
in your R console.
Here is the example code copy/pasted from there:
## BULK INSERT ATTACHMENTS
# prepare your .zip file and request.txt manifest before calling these functions
file <- 'request.zip'
job_info <- rforcecom.createBulkJob(session, operation='insert', object='Attachment')
batch_attachment_info <- rforcecom.insertBulkAttachments(session,
jobId=job_info$id,
file=file)
batch_attachment_status <- rforcecom.checkBatchStatus(session,
jobId=batch_attachment_info$jobId,
batchId=batch_attachment_info$id)
batch_attachment_details <- rforcecom.getBatchDetails(session,
jobId=batch_attachment_info$jobId,
batchId=batch_attachment_info$id)
# close the job
close_job_info <- rforcecom.closeBulkJob(session, jobId=job_info$id)
Here is the reference page on how to create your zip file of attachments with the appropriate manifest: https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/binary_intro.htm. Your zip file is kept locally and you should create a CSV request.txt file since that is the default format used by the RForcecom package when inserting into the Attachment object.
You are the man, thanks for your help! Ok, seeing that, how am I am able to assign that attachment to a specific record on my custom object?
Your manifest.txt file will contain the Ids. The Ids are always unique, even across objects, so don't worry, the attachments will end up going to the right records on your custom object as long as you've specified their individual Ids properly.
I suggest, going through and updating all the records first, that way you know the Ids and can put those in your manifest.txt file.
Ahhhh! Let me take a look. Much appreciated.
So right now within the R Code we can update one record at a time. But if we run a for loop and attempt to update many records, we can run into the 1000-15000 request limit. Also it's kinda slow.
Can somebody implement the bulk insert / update?