nightroman / Mdbc

MongoDB Cmdlets for PowerShell
Apache License 2.0
141 stars 16 forks source link

Add-MdbcData only adds 1 document at a time #51

Closed awsles closed 3 years ago

awsles commented 3 years ago

The Add-MdbcData cmdlet only adds one document at a time when the -InputObject switch is used with an array. For example: Add-MdbcData -InputObject $MyArray -Collection $coll, my $MyArray contains an array of documents. Passing an array this way simply adds a single document containing an array of the documents -- not exactly as intended. However, using the PowerShell piping iterator $MyArray | Add-MdbcData -Collection $coll does work.

This behaviour differs from PowerShell practice to also allow the set of objects to be passed and iterated by the implementation (possibly using db.collection.bulkWrite() ?). At minimum, the documentation page should be updated to reflex this behaviour. Ideally though, passing in an array could be detected by the underlying code (which is faster than using the PowerShell piping iterator).

nightroman commented 3 years ago

FWIW, many PowerShell official and community cmdlets do not follow this "practice". Try

Out-String -InputObject @(@{x=1}, @{x=2})
@(@{x=1}, @{x=2}) | Out-String

PowerShell common practice is using the pipeline for many input objects. Having said that, I agree that the current behavior is not useful. Let me think.

awsles commented 3 years ago

Writing one at a time is quite slow so a bulk insert option would be quite handy. I also explored trying to call db.collection.bulkWrite() using Invoke-MdbcCommand but it doesn't appear that shell methods can be called via that cmdlet (an Invoke-MdbcShellCommand cmdlet would be awesome to have).

nightroman commented 3 years ago

@lesterw1 Does this all exist in C# driver? Mdbc does not use shell, it uses C# driver. The examples/suggestions in shell format are not that useful...

nightroman commented 3 years ago

Done, v6.5.8

awsles commented 3 years ago

Just tested v6.5.8. Using a collection for -InputObject is about 8% faster than pipelining. Great result! Thank you.

awsles commented 3 years ago

What about using the C# driver collection.BulkWriteAsync() as a further optimization?

nightroman commented 3 years ago

What about using the C# driver collection.BulkWriteAsync() as a further optimization?

Unlike this topic suggestion, BulkWriteAsync is not that straightforward to engage. I am sure this is doable but currently I have no such plans. (1) I have no free time. (2) It needs some thinking about the design/concept. Bulk write is about all kind of write operations combined together, not about just adding documents. This operation probably needs a new cmdlet, and it is not obvious how to design its input.