slingdata-io / sling-cli

Sling is a CLI tool that extracts data from a source storage/database and loads it in a target storage/database.
https://docs.slingdata.io
GNU General Public License v3.0
398 stars 27 forks source link

Runtime Panic after migrating from 1.1.14 to 1.2.x #248

Closed ShahBinoy closed 5 months ago

ShahBinoy commented 5 months ago

Issue Description

Started receiving Runtime Panic on a working replication flow for sling version 1.2.2, the system was working until 1.1.14

Below is the error message

1s 54,474 ? p/s 28 MB                   2024-04-04 14:32:23 ERR  error="panic occurred! \"invalid memory address or nil pointer dereference\"\ngoroutine 134 [running]:\nruntime/debug.Stack()\n\t/Users/github/gh-runner/sling-cli/_work/_tool/go/1.21.7/arm64/src/runtime/debug/stack.go:24 +0x64\ngithub.com/slingdata-io/sling-cli/core/dbio/iop.(*Datastream).processBwRows.func1()\n\t/Users/github/gh-runner/sling-cli/_work/sling-cli/sling-cli/core/dbio/iop/datastream.go:166 +0x34\npanic({0x10803b780?, 0x109c2abe0?})\n\t/Users/github/gh-runner/sling-cli/_work/_tool/go/1.21.7/arm64/src/runtime/panic.go:914 +0x218\ngithub.com/spf13/cast.ToStringE({0x107ee4040?, 0x0?})\n\t/Users/fritz/go/pkg/mod/github.com/spf13/cast@v1.5.0/caste.go:918 +0x3d4\ngithub.com/spf13/cast.ToString(...)\n\t/Users/fritz/go/pkg/mod/github.com/spf13/cast@v1.5.0/cast.go:108\ngithub.com/slingdata-io/sling-cli/core/dbio/iop.(*StreamProcessor).CastToString(0x14001c4b040, 0x0?, {0x107ee4040?, 0x0?}, {0x14001140f38?, 0x1051e61dc?, 0x14001140f48?})\n\t/Users/github/gh-runner/sling-cli/_work/sling-cli/sling-cli/core/dbio/iop/stream_processor.go:780 +0x1e0\ngithub.com/slingdata-io/sling-cli/core/dbio/iop.(*Datastream).CastRowToString(0x14001883860, {0x1400379adc0, 0x17, 0x2c?})\n\t/Users/github/gh-runner/sling-cli/_work/sling-cli/sling-cli/core/dbio/iop/datastream.go:217 +0xd0\ngithub.com/slingdata-io/sling-cli/core/dbio/iop.(*Datastream).processBwRows(0x14001883860)\n\t/Users/github/gh-runner/sling-cli/_work/sling-cli/sling-cli/core/dbio/iop/datastream.go:172 +0x68\ncreated by github.com/slingdata-io/sling-cli/core/dbio/iop.(*Datastream).Start in goroutine 47\n\t/Users/github/gh-runner/sling-cli/_work/sling-cli/sling-cli/core/dbio/iop/datastream.go:666 +0xf68\n"
{
  "source": "ACADIA",
  "target": "S3_LAKE_ZONE_A",
  "defaults": {
    "mode": "full-refresh",
    "object": "my_folder/elt/data_landing/{source_name}/{stream_table}/programId=106/date={YYYY}-{MM}-{DD}/*",
    "update_key": "updated_on",
    "target_options": {
      "file_max_rows": 750000,
      "format": "parquet"
    }
  },
  "streams": {
    "acadia.cati_session_details": {
      "update_key": "id",
      "disabled": true
    },
    "acadia.communicator": {
      "update_key": "id",
      "disabled": true
    },
    "acadia.configuration_table_names": {
      "update_key": "id",
      "disabled": true
    },
    "acadia.data_type_category_data_type_root_category": {
      "update_key": "id",
      "disabled": true
    },
    "acadia.data_type_data_type_category": {
      "update_key": "id",
      "disabled": true
    },
    "acadia.data_type_property_mapping": {
      "update_key": "id",
      "disabled": true
    },
"acadia.survey_responses": {
      "mode": "full-refresh",
      "sql": "SELECT ups.program_id                                                         AS PROGRAM_ID,\n       fe.user_id                                                             AS PARTICIPANT_ID,\n       (u.activated + 0)                                                      AS IS_USER_ACTIVE,\n       fv.form_id                                                             AS FORM_ID,\n       (fv.is_active + 0)                                                     AS IS_FORM_VERSION_ACTIVE,\n       fe.form_version_id                                                     AS FORM_VERSION_SERVER_ID,\n       f.name                                                                 AS FORM_NAME,\n       ffe.form_field_id                                                      AS QUESTION_FIELD_ID,\n       ffev.value_as_string                                                   AS stringValue,\n       ffev.value_as_number                                                   AS numberValue,\n       IF(ffev.value_as_boolean is not null, ffev.value_as_boolean + 0, null) AS booleanValue,\n       fe.id                                                                  AS SUBMISSION_ID,\n       u.email                                                                AS loginEmail,\n       u.verified_primary_phone_number                                        AS loginPhoneNumber,\n       (fe.is_modified + 0)                                                   AS IS_MODIFIED,\n       FLOOR(fe.original_submission_time / 1000)                              AS ORIGINAL_SUBMISSION,\n       FLOOR(fe.created_on / 1000)                              AS CREATED_ON,\n       FLOOR(fe.last_submission_time / 1000)                                  AS LAST_SUBMISSION_TIME,\n       FLOOR(fe.updated_on / 1000)                                            AS UPDATED_ON,\n       fe.created_by_id                                                       AS CREATED_BY,\n       fe.updated_by_id                                                       AS UPDATED_BY,\n       fe.draft + 0                                                           AS COMPLETED\nFROM user_program_subscription ups\n         INNER JOIN users u ON\n    ups.user_id = u.id AND u.test_user = 0 AND ((u.email NOT LIKE 'vhtest%%'\n        AND ((u.email LIKE '%%@example.com%%' AND u.activated = 0) OR (u.email NOT LIKE '%%@example.com%%'))\n        AND u.email NOT LIKE 'qa%%') OR u.email is NULL)\n         LEFT JOIN form_entry fe ON\n    u.id = fe.user_id\n         LEFT JOIN form_version fv ON\n    fe.form_version_id = fv.id\n         LEFT JOIN form_field_entry ffe ON\n    fe.id = ffe.form_entry_id\n         LEFT JOIN form_field_entry_value ffev ON\n    ffe.id = ffev.form_field_entry_id\n         LEFT JOIN form f ON\n    fv.form_id = f.id\nWHERE fe.updated_on >= UNIX_TIMESTAMP('{START_DATE} 00:00:00') * 1000\n  AND fe.updated_on < UNIX_TIMESTAMP('{END_DATE} 23:59:59') * 1000\n  AND ups.program_id = {PROGRAM_ID}",
      "disabled": false
    }
  },
  "env": {
    "SLING_LOADED_AT_COLUMN": true,
    "SLING_STREAM_URL_COLUMN": true,
    "path_prefix": "${DATALAKE_ELT_LANDING_PREFIX}",
    "START_DATE": "${SLING_SYNC_START_DATE}",
    "END_DATE": "${SLING_SYNC_END_DATE}",
    "PROGRAM_ID": "${SLING_SYNC_PROGRAM_ID}",
    "SLING_SHOW_PROGRESS": "false",
    "SLING_DISABLE_TELEMETRY": "false",
    "SLING_HOME_DIR": "/sling_common/sling/../../sling",
    "SLING_LOGGING": "NO_COLOR",
    "SLING_ALLOW_EMPTY_TABLES": "true"
  }
}
2024-04-04 14:47:24 INF Sling Replication [78 streams] | ACADIA -> S3_LAKE_ZONE_A

2024-04-04 14:47:24 DBG [1 / 78] skipping stream acadia.cati_session_details since it is disabled
...
...
2024-04-04 14:47:24 DBG [77 / 78] skipping stream acadia.user_token_mapping since it is disabled

2024-04-04 14:47:24 INF [78 / 78] running stream acadia.survey_responses
2024-04-04 14:47:24 DBG conn credentials of S3_LAKE_ZONE_A from sling env yaml overwritten by env var S3_LAKE_ZONE_A
2024-04-04 14:47:24 DBG Sling version: 1.2.2 (darwin arm64)
2024-04-04 14:47:24 DBG type is db-file
2024-04-04 14:47:24 DBG using source options: {"empty_as_null":true,"null_if":"NULL","datetime_format":"AUTO","max_decimals":-1}
2024-04-04 14:47:24 DBG using target options: {"header":true,"compression":"AUTO","concurrency":7,"datetime_format":"auto","delimiter":",","file_max_rows":750000,"format":"parquet","max_decimals":-1,"use_bulk":true,"add_new_columns":true,"column_casing":"source"}
2024-04-04 14:47:24 INF connecting to source database (mysql)
2024-04-04 14:47:24 DBG opened "mysql" connection (conn-mysql-8NT)
2024-04-04 14:47:24 INF reading from source database
2024-04-04 14:47:25 DBG using text since type 'unsigned bigint' not mapped for col 'IS_USER_ACTIVE'
2024-04-04 14:47:25 DBG using text since type 'unsigned bigint' not mapped for col 'IS_FORM_VERSION_ACTIVE'
2024-04-04 14:47:25 DBG using text since type 'unsigned bigint' not mapped for col 'booleanValue'
2024-04-04 14:47:25 DBG using text since type 'unsigned bigint' not mapped for col 'IS_MODIFIED'
2024-04-04 14:47:25 DBG using text since type 'unsigned bigint' not mapped for col 'COMPLETED'
2024-04-04 14:47:25 DBG SELECT ups.program_id                                                         AS PROGRAM_ID,
       fe.user_id                                                             AS PARTICIPANT_ID,
       (u.activated + 0)                                                      AS IS_USER_ACTIVE,
       fv.form_id                                                             AS FORM_ID,
       (fv.is_active + 0)                                                     AS IS_FORM_VERSION_ACTIVE,
       fe.form_version_id                                                     AS FORM_VERSION_SERVER_ID,
       f.name                                                                 AS FORM_NAME,
       ffe.form_field_id                                                      AS QUESTION_FIELD_ID,
       ffev.value_as_string                                                   AS stringValue,
       ffev.value_as_number                                                   AS numberValue,
       IF(ffev.value_as_boolean is not null, ffev.value_as_boolean + 0, null) AS booleanValue,
       fe.id                                                                  AS SUBMISSION_ID,
       u.email                                                                AS loginEmail,
       u.verified_primary_phone_number                                        AS loginPhoneNumber,
       (fe.is_modified + 0)                                                   AS IS_MODIFIED,
       FLOOR(fe.original_submission_time / 1000)                              AS ORIGINAL_SUBMISSION,
       FLOOR(fe.created_on / 1000)                              AS CREATED_ON,
       FLOOR(fe.last_submission_time / 1000)                                  AS LAST_SUBMISSION_TIME,
       FLOOR(fe.updated_on / 1000)                                            AS UPDATED_ON,
       fe.created_by_id                                                       AS CREATED_BY,
       fe.updated_by_id                                                       AS UPDATED_BY,
       fe.draft + 0                                                           AS COMPLETED
FROM user_program_subscription ups
         INNER JOIN users u ON
    ups.user_id = u.id AND u.test_user = 0 AND ((u.email NOT LIKE 'vhtest%'
        AND ((u.email LIKE '%@example.com%' AND u.activated = 0) OR (u.email NOT LIKE '%@example.com%'))
        AND u.email NOT LIKE 'qa%') OR u.email is NULL)
         LEFT JOIN form_entry fe ON
    u.id = fe.user_id
         LEFT JOIN form_version fv ON
    fe.form_version_id = fv.id
         LEFT JOIN form_field_entry ffe ON
    fe.id = ffe.form_entry_id
         LEFT JOIN form_field_entry_value ffev ON
    ffe.id = ffev.form_field_entry_id
         LEFT JOIN form f ON
    fv.form_id = f.id
WHERE fe.updated_on >= UNIX_TIMESTAMP('2018-01-01 00:00:00') * 1000
  AND fe.updated_on < UNIX_TIMESTAMP('2024-04-04 23:59:59') * 1000
  AND ups.program_id = 106
2024-04-04 14:47:25 DBG using text since type 'unsigned bigint' not mapped for col 'IS_USER_ACTIVE'
2024-04-04 14:47:25 DBG using text since type 'unsigned bigint' not mapped for col 'IS_FORM_VERSION_ACTIVE'
2024-04-04 14:47:25 DBG using text since type 'unsigned bigint' not mapped for col 'booleanValue'
2024-04-04 14:47:25 DBG using text since type 'unsigned bigint' not mapped for col 'IS_MODIFIED'
2024-04-04 14:47:25 DBG using text since type 'unsigned bigint' not mapped for col 'COMPLETED'
2024-04-04 14:47:25 INF writing to target file system (s3)
2024-04-04 14:47:25 DBG writing to s3://some-s3-bucket/home_folder/elt/data_landing/acadia/survey_responses/programId=106/date=2024-04-04/part.01 [fileRowLimit=750000 fileBytesLimit=50000000 compression=AUTO concurrency=7 useBufferedStream=false fileFormat=parquet]
1s 54,211 ? p/s 10 MB | 62% MEM | 9% CPU 2024-04-04 14:47:27 ERR  error="--- datastream.go:172 processBwRows ---\n--- datastream.go:217 CastRowToString ---\n--- stream_processor.go:780 CastToString ---\n--- cast.go:108 ToString ---\n--- caste.go:918 ToStringE ---\n--- signal_unix.go:861 sigpanic ---\n--- panic.go:261 panicmem ---\n--- panic.go:914 gopanic ---\n--- datastream.go:166 func1 ---\npanic occurred! \"invalid memory address or nil pointer dereference\"\ngoroutine 117 [running]:\nruntime/debug.Stack()\n\t/Users/github/gh-runner/sling-cli/_work/_tool/go/1.21.7/arm64/src/runtime/debug/stack.go:24 +0x64\ngithub.com/slingdata-io/sling-cli/core/dbio/iop.(*Datastream).processBwRows.func1()\n\t/Users/github/gh-runner/sling-cli/_work/sling-cli/sling-cli/core/dbio/iop/datastream.go:166 +0x34\npanic({0x1046c7780?, 0x1062b6be0?})\n\t/Users/github/gh-runner/sling-cli/_work/_tool/go/1.21.7/arm64/src/runtime/panic.go:914 +0x218\ngithub.com/spf13/cast.ToStringE({0x104570040?, 0x0?})\n\t/Users/fritz/go/pkg/mod/github.com/spf13/cast@v1.5.0/caste.go:918 +0x3d4\ngithub.com/spf13/cast.ToString(...)\n\t/Users/fritz/go/pkg/mod/github.com/spf13/cast@v1.5.0/cast.go:108\ngithub.com/slingdata-io/sling-cli/core/dbio/iop.(*StreamProcessor).CastToString(0x14001292280, 0x0?, {0x104570040?, 0x0?}, {0x14000f8ff38?, 0x1018721dc?, 0x14000f8ff48?})\n\t/Users/github/gh-runner/sling-cli/_work/sling-cli/sling-cli/core/dbio/iop/stream_processor.go:780 +0x1e0\ngithub.com/slingdata-io/sling-cli/core/dbio/iop.(*Datastream).CastRowToString(0x14001882d00, {0x140037ed340, 0x17, 0x102d154cb?})\n\t/Users/github/gh-runner/sling-cli/_work/sling-cli/sling-cli/core/dbio/iop/datastream.go:217 +0xd0\ngithub.com/slingdata-io/sling-cli/core/dbio/iop.(*Datastream).processBwRows(0x14001882d00)\n\t/Users/github/gh-runner/sling-cli/_work/sling-cli/sling-cli/core/dbio/iop/datastream.go:172 +0x68\ncreated by github.com/slingdata-io/sling-cli/core/dbio/iop.(*Datastream).Start in goroutine 131\n\t/Users/github/gh-runner/sling-cli/_work/sling-cli/sling-cli/core/dbio/iop/datastream.go:666 +0xf68\n"
51s 158,438 3718 r/s 29 MB | 62% MEM | 6% CPU ^C
interrupting...
52s 158,438 3115 r/s 29 MB | 62% MEM | 6% CPU
2024-04-04 14:48:22 INF execution failed
2024-04-04 14:48:22 INF ~ execution failed

...

2024-04-04 14:48:22 INF Sling Replication Completed in 57s | ACADIA -> S3_LAKE_ZONE_A | 0 Successes | 1 Failures
flarco commented 5 months ago

Thank you.

flarco commented 5 months ago

Should be better in v1.2.3. Closing.