petl-developers / petl

Python Extract Transform and Load Tables of Data
MIT License
1.22k stars 190 forks source link

speed for big table slower than kettle #617

Open wonb168 opened 2 years ago

wonb168 commented 2 years ago

speed for exporting big table to csv,for example 1,0000,0000 rows+, slower than kettle, how to raise speed in petl, thank you ?

juarezr commented 2 years ago

Hi, @wonb168,

Did you have any further details?

wonb168 commented 2 years ago

0.17 billion rows table, from sqlserver to csv, (need csv to batch load to gpdb) use kettle: 37min use petl: 67min how to raise speed in petl? thank you

wonb168 commented 2 years ago

run on my notebook, win10 os, 16M memory. only 8000row/s table = petl.fromdb(conn, 'SELECT * FROM ReplenishLZ.dbo.ps_inv_materialsize')

177833017 rows

table.progress(1000000).tocsv('ps_inv_materialsize.csv')

1000000 rows in 120.57s (8294 row/s); batch in 120.57s (8294 row/s) 2000000 rows in 249.68s (8010 row/s); batch in 129.11s (7745 row/s) 3000000 rows in 368.70s (8136 row/s); batch in 119.02s (8401 row/s) 4000000 rows in 492.33s (8124 row/s); batch in 123.63s (8088 row/s) 5000000 rows in 620.53s (8057 row/s); batch in 128.19s (7800 row/s) 6000000 rows in 741.91s (8087 row/s); batch in 121.38s (8238 row/s) 7000000 rows in 857.63s (8161 row/s); batch in 115.72s (8641 row/s) 8000000 rows in 983.10s (8137 row/s); batch in 125.46s (7970 row/s)

wonb168 commented 2 years ago

my 700,0000 rows table , use petl to csv need 40 miniutes. and use connectorx, parrelled by 10, only 15miniutes, can patch connectorx to petl to speed in next version?