jur9526 / couchdb-python

Automatically exported from code.google.com/p/couchdb-python
Other
0 stars 0 forks source link

Provide ability to do bulk dump and load #226

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Currently load.py and dump.py utilities are loading/dumping documents one by 
one which is tremendously slow.

Introducing bulk loading/dumping will really speed up the things here. 

Maybe we can add an option like "--bulk-size" with default value set to 1 
(load/dump documents one by one, just like it happens now) to allow user some 
additional utility tuning.

Original issue reported on code.google.com by AntonBak...@gmail.com on 14 Jun 2013 at 3:08

GoogleCodeExporter commented 9 years ago
I'm working on initial implementation here, will provide some patches later

Original comment by Pavel.Ts...@gmail.com on 14 Jun 2013 at 3:10

GoogleCodeExporter commented 9 years ago
I finished bulk dumping documents. You can see it here 
https://code.google.com/r/paveltsipinio-bulk-dumping/source/detail?r=e0f1bda24cc
0bc487bf782ebdabc9d817bf7d4f6&name=bulk_dumping

Original comment by Pavel.Ts...@gmail.com on 17 Jun 2013 at 12:10

GoogleCodeExporter commented 9 years ago
Good stuff! For inclusion into CouchDB-Python, I have a number of requests:

- Please remove the change in .hgignore, as it isn't needed anymore
- Please see if you can add a test for the new behavior
- It would be great if you can split this into two patches: one that abstracts 
writing into a separate function, and another one that actually does the bulk 
requests/writes -- this makes it easier to review the changes now and in the 
future

Original comment by djc.ochtman on 17 Jun 2013 at 12:44

GoogleCodeExporter commented 9 years ago
I fixed your requests and added bulk load method. 
https://code.google.com/r/paveltsipinio-bulk-dumping/source/detail?r=de81adea330
909f13d9bf37f98e25d4b7c657a92&name=bulk_dumping

Original comment by Pavel.Ts...@gmail.com on 17 Jun 2013 at 5:23

GoogleCodeExporter commented 9 years ago
I've pushed modified versions; for r6f91fa675423, I:

- Renamed function from write_dump() to dump_doc()
- Moved dump_doc() outside dump_db(), added envelope argument
- Rewrote commit message to clarify

In re8cafe210d91, I:

- Made sure lines didn't get longer than 80 chars
- Tightened up the loop code (while True, if condition: break is a little silly)
- Rewrote commit message to clarify

Could you redo your bulk loading along these lines? You also introduce a bug 
wrt error handling; db.update() doesn't throw Exceptions like db.__setattr__(). 
Also, your test case references a test data file that isn't included in the 
patch.

Original comment by djc.ochtman on 18 Jun 2013 at 8:00

GoogleCodeExporter commented 9 years ago
I hope I clearly understand your recommendations about code design. I pushed it 
to 
https://code.google.com/r/paveltsipinio-bulk-dumping/source/detail?r=46b5043fe46
5274850c4a821e468ca9ca90b70e0&name=bulk_dumping

I did not understand what you mean about test data file. I don't have any test 
data files.

Original comment by Pavel.Ts...@gmail.com on 18 Jun 2013 at 3:32

GoogleCodeExporter commented 9 years ago
This issue has been migrated to GitHub. Please continue discussion here:

https://github.com/djc/couchdb-python/issues/226

Original comment by djc.ochtman on 15 Jul 2014 at 7:22