slingdata-io / sling-cli

Sling is a CLI tool that extracts data from a source storage/database and loads it in a target storage/database.
https://docs.slingdata.io
GNU General Public License v3.0
398 stars 27 forks source link

Sling crashes when migrating table from postgres to s3 #268

Closed giqua closed 5 months ago

giqua commented 5 months ago

Issue Description

# replication.yaml

source: READ_ONLY_PG
target: LOCALSTACK_S3

defaults:
  mode: incremental
  primary_key: [id]
  object: "s3://bucket/{stream_table}/*.parquet"
  target_options:
    format: parquet
    file_max_rows: 100000
  # source_options:
  #     limit: 1000

streams:
  # all tables in schema, except "forbidden_table"
  # public.*:
  #   object: "{stream_schema}/{stream_table}/{YYYY}_{MM}_{DD}/*.parquet"
  #   primary_key: [id]
  #   target_options:
  #     file_max_rows: 400000 # will split files into folder
  # public.forbidden_table:
  #   disabled: true  

  public.*:
    object: "s3://bucket/{stream_table}/*.parquet"
    primary_key: [id]
    target_options:
      file_max_rows: 100000 # will split files into folder
      format: parquet
    mode: incremental

  public.alembic_version:
    disabled: true

  public.article:
    disabled: true

  public.failedarticle:
    disabled: false
  ...
Log attached to the issue due to body too long

log.txt

flarco commented 5 months ago

Thanks for reporting. I've seen this a few times, but faults are difficult to reproduce. It seems to happen with the conversion of timestamp values into a string.

I made some changes, to attempt a safer conversion. Can you try it with this dev build and let me know?

giqua commented 5 months ago

It still gives me segmentation violation, attached you can find the logs of the latest run. Can I please ask you some insights on the error so I can try to debug it? log_new_run.txt

flarco commented 5 months ago

If you look at the stack trace, it's crashing when go tries to convert a time.Time into a string with the .Format() function. I honestly think it might be a Go bug. It's a common function.

From what I'm gathering, it's happening when the time zone is set.

I think it's working, but I didn't change one other part. Can you try with this one: https://f.slingdata.io/sling-linux-20240418b.zip

fault address 0x0
fatal error: fault
[signal SIGSEGV: segmentation violation code=0x80 addr=0x0 pc=0x49f2ca]

goroutine 157 [running]:
runtime.throw({0x311bf19?, 0xc000f3a700?})
    /github/sling-cli/_work/_tool/go/1.21.8/x64/src/runtime/panic.go:1077 +0x5c fp=0xc001709a08 sp=0xc0017099d8 pc=0x445f1c
runtime.sigpanic()
    /github/sling-cli/_work/_tool/go/1.21.8/x64/src/runtime/signal_unix.go:875 +0x285 fp=0xc001709a68 sp=0xc001709a08 pc=0x45cee5
time.Time.locabs({0x17bc0e2b8b4fa4c8, 0x17b9f2153a01e2b8, 0x17b9f2153a01e2b8})
    /github/sling-cli/_work/_tool/go/1.21.8/x64/src/time/time.go:489 +0x8a fp=0xc001709a90 sp=0xc001709a68 pc=0x49f2ca
time.Time.appendFormat({0x7f0750d48d28?, 0x20?, 0x17b9f2153a01e2b8?}, {0xc001709dd8, 0x0, 0x40}, {0x3165ee7, 0x1e})
    /github/sling-cli/_work/_tool/go/1.21.8/x64/src/time/format.go:650 +0x6a fp=0xc001709d18 sp=0xc001709a90 pc=0x49732a
time.Time.AppendFormat({0x3b7d0b0?, 0xc001709dd8?, 0x17b9f2153a01e2b8?}, {0xc001709dd8, 0x0, 0x40}, {0x3165ee7, 0x1e})
    /github/sling-cli/_work/_tool/go/1.21.8/x64/src/time/format.go:644 +0x151 fp=0xc001709d80 sp=0xc001709d18 pc=0x497211
time.Time.Format({0xc001aa5e30?, 0x30e5ec0?, 0x17b9f2153a01e2b8?}, {0x3165ee7?, 0x7f0750d48d28?})
    /github/sling-cli/_work/_tool/go/1.21.8/x64/src/time/format.go:630 +0xd0 fp=0xc001709e30 sp=0xc001709d80 pc=0x497050
github.com/slingdata-io/sling-cli/core/dbio/iop.(*StreamProcessor).CastToString(0xc0005728c0, 0x0?, {0x30e5ec0?, 0xc0009c7a98?}, {0xc001709f08?, 0x691b75?, 0xc001709e00?})
    /github/sling-cli/_work/sling-cli/sling-cli/core/dbio/iop/stream_processor.go:846 +0x393 fp=0xc001709eb0 sp=0xc001709e30 pc=0x116f2d3
giqua commented 5 months ago

Yes, it worked now. Can I please ask you what changes have you made to fix the tool? If I want to use it with other tools like Dagster how can I integrate the two tools?

flarco commented 5 months ago

Great. Commits are here: https://github.com/slingdata-io/sling-cli/pull/269

For Dagster, you can find the guide here: https://docs.dagster.io/integrations/embedded-elt/sling

I have to release the next version with the changes (probably this weekend) for you to use it in Dagster, so it should be available soon.

Closing this.

giqua commented 5 months ago

Thank you very much for your support, I'll wait for the next release to integrate it with Dagster.