Closed max-yan closed 3 months ago
Hi, I'm not sure I understand the problem.
Memory consumption does not allow to load large tables
Are you facing OOM errors? is the process being killed?
The way sling works is by inserting everything in a temp table inside a transaction, then insert into final table from temp table. So, only when the transaction closes that you would be able to see the data.
Are you facing OOM errors? is the process being killed?
I killed sling process before oom. Memory usage was more than 100GB. _tmp table was empty.
Ah yes, I opened an issue here for this: https://github.com/ClickHouse/clickhouse-go/issues/1293
This is a problem in the 3rd party clickhouse driver that Sling uses.
What you could do it use source_options.limit
? or CLI flag --limit
with --incremental
.
I'll see if there is a good way to flush the records so that memory is released.
For clickhouse sling does not try to use multiple inserts, but writes the whole result at the end. I'm talking about _tmp table, not final.
@max-yan can you compile the binary from branch v1.2.11
and try?
I added this line: https://github.com/slingdata-io/sling-cli/pull/303/commits/cc6d22b6bdeed89d7c4e7076af992619a10ad940
@flarco I can see inserts now but SIGSEGV too
[signal SIGSEGV: segmentation violation code=0x2 addr=0xc05d800000 pc=0x561872]
goroutine 162 gp=0xc001c816c0 m=32 mp=0xc000d2a808 [running]:
runtime.throw({0x346e1ba?, 0x4c4?})
/usr/local/go/src/runtime/panic.go:1023 +0x5c fp=0xc0000f0d60 sp=0xc0000f0d30 pc=0x4478bc
runtime.sigpanic()
/usr/local/go/src/runtime/signal_unix.go:895 +0x285 fp=0xc0000f0dc0 sp=0xc0000f0d60 pc=0x460445
strings.IndexAny({0xc05151d304?, 0xc05151d303?}, {0x346c985?, 0x0?})
/usr/local/go/src/strings/strings.go:161 +0x172 fp=0xc0000f0e38 sp=0xc0000f0dc0 pc=0x561872
github.com/flarco/g/csv.(*Writer).Write(0xc00220fa70, {0xc0048be360?, 0x9, 0xc0048be360?})
/home/maxim/go/pkg/mod/github.com/flarco/g@v0.1.97/csv/writer.go:84 +0x334 fp=0xc0000f0ee0 sp=0xc0000f0e38 pc=0xa640b4
github.com/slingdata-io/sling-cli/core/dbio/iop.(*Datastream).writeBwCsv(0xc00194a680, {0xc0048be360?, 0x9?, 0x10?})
/home/maxim/soft_src/sling-cli/core/dbio/iop/datastream.go:282 +0x27 fp=0xc0000f0f10 sp=0xc0000f0ee0 pc=0x1187407
github.com/slingdata-io/sling-cli/core/dbio/iop.(*Datastream).processBwRows.func1()
/home/maxim/soft_src/sling-cli/core/dbio/iop/datastream.go:210 +0xc5 fp=0xc0000f0f80 sp=0xc0000f0f10 pc=0x1186be5
github.com/slingdata-io/sling-cli/core/dbio/iop.(*Datastream).processBwRows(0xc00194a680)
/home/maxim/soft_src/sling-cli/core/dbio/iop/datastream.go:219 +0x91 fp=0xc0000f0fc8 sp=0xc0000f0f80 pc=0x1186af1
github.com/slingdata-io/sling-cli/core/dbio/iop.(*Datastream).Start.gowrap1()
/home/maxim/soft_src/sling-cli/core/dbio/iop/datastream.go:774 +0x25 fp=0xc0000f0fe0 sp=0xc0000f0fc8 pc=0x118cb05
runtime.goexit({})
/usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0000f0fe8 sp=0xc0000f0fe0 pc=0x482a41
created by github.com/slingdata-io/sling-cli/core/dbio/iop.(*Datastream).Start in goroutine 151
/home/maxim/soft_src/sling-cli/core/dbio/iop/datastream.go:774 +0x121c
Try again with env var SLING_PROCESS_BW=false
? I have researched but don't know why the segmentation violation
occurs... It happens randomly, non-deterministic.
@max-yan how is the memory usage? did it improve?
@flarco SIGSEGV not repeatable. I was able to load without "SLING_PROCESS_BW=false". I don't see memory leaks with a 20GB csv file (238576508 rows).
Great. @alisman FYI, looks like the batching the inserts works. Will be fixed in next release v1.2.11.
I used batch.Limit = 2000000. option needed
@max-yan yes agreed, will add. This was to test.
Excellent! THanks all!
--Aaron
On Fri, May 31, 2024 at 9:33 AM Fritz Larco @.***> wrote:
@max-yan https://github.com/max-yan yes agreed, will add. This was to test.
— Reply to this email directly, view it on GitHub https://github.com/slingdata-io/sling-cli/issues/312#issuecomment-2142176795, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABNRGPYZTZ3WV7YOSBIDI3ZFB34FAVCNFSM6AAAAABIOV6ZZKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBSGE3TMNZZGU . You are receiving this because you were mentioned.Message ID: @.***>
Added target_options.batch_limit. Closing.
Issue Description
With --tgt-conn clickhouse all rows are inserted at the last moment. Memory consumption does not allow to load large tables. In progress _tmp table is empty it's not a memory leak.
same result with: --src-conn mysql, --src-conn postgres works as expected: --src-conn postgres --tgt-conn mysql (any use_bulk option)
I'm not a go developer but tried to find the problem. In ClickhouseConn::BulkImportStream only one element of ds.BatchChan is obtained and I don't understand how --tgt-conn mysql works if BatchChan filled in independent of tgt-conn place.
Sling version (
sling --version
): 1.2.10Operating System (
linux
,mac
,windows
): linuxLog Output (please run command with
-d
):