mogui / pyorient

Orientdb driver for python that uses the binary protocol.
Apache License 2.0
167 stars 127 forks source link

OGM batch performance problems due to string concatenation #261

Open k-sok opened 6 years ago

k-sok commented 6 years ago

The resulting batch script is built by using string concatenation.

commands = 'BEGIN\n'
commands += cmd1
commands += cmd2
...
commands += cmdN
g.client.batch(commands)

The larger the entire script is (i.e. the commands string) the longer takes each concatenation operation (because of the underlying memory reallocation and copying).

A better approach is to use a list of strings and join it when committing.

commands = ['BEGIN']
commands.append(cmd1)
commands.append(cmd2)
...
commands.append(cmdN)
g.client.batch('\n'.join(commands)

By implementing the above solution, I was able to improve the performance with large batches by magnitudes.

TropicalPenguin commented 6 years ago

Yo,

The develop branch already uses the list strategy :)

On Tue, Jan 9, 2018 at 6:09 AM, k-sok notifications@github.com wrote:

The resulting batch script is built by using string concatenation.

commands = 'BEGIN\n' commands += cmd1 commands += cmd2 ... commands += cmdN g.client.batch(commands)

The larger the entire script is (i.e. the commands string) the longer takes each concatenation operation (because of the underlying memory reallocation and copying).

A better approach is to use a list of strings and join it when committing.

commands = ['BEGIN'] commands.append(cmd1) commands.append(cmd2) ... commands.append(cmdN) g.client.batch('\n'.join(commands)

By implementing the above solution, I was able to improve the performance with large batches by magnitudes.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mogui/pyorient/issues/261, or mute the thread https://github.com/notifications/unsubscribe-auth/ALCuO92xxx4KSOErxmi1Yu8cN7RPbYIwks5tIkvegaJpZM4RWrE2 .

k-sok commented 6 years ago

Yes, just saw it after forking. I wonder how are the chances that this stack design will survive to the next version. I have to subclass Batch and have to decide what to rely on...