thoughtspot / cs_tools

Scale your ThoughtSpot adoption with tools created by the ThoughtSpot Solutions Consulting organization.
https://thoughtspot.github.io/cs_tools/
Other
8 stars 6 forks source link

BigQuery: searchable metadata --include-column-access has returned "line contains NUL" #144

Closed DBoudart23 closed 2 months ago

DBoudart23 commented 3 months ago

First Stop

Platform Configuration

╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ │ │ Info snapshot taken on 2024-05-29 │ │ │ │ CS Tools: 1.5.7 │ │ Python Version: Python 3.9.10 (main, Apr 5 2022, 14:16:03) │ │ [Clang 13.1.6 (clang-1316.0.21.2)] │ │ System Info: Darwin (detail: macOS-14.4.1-x86_64-i386-64bit) │ │ Configs Directory: /Users/ {anonymous} /Library/Application Support/cs_tools │ │ Activate VirtualEnv: source "/Users/ {anonymous} /Library/Application Support/cs_tools/.cs_tools/bin/activate" │ │ Platform Tags: macosx-12.3-x86_64 │ │ │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Description

Hello team,

When executing the command: cs_tools tools searchable metadata --include-column-access --config david_cnfg --syncer "bigquery://definition.toml"

It returns the following error message: "Error: line contains NUL" Let me share further details on the message:

/Library/Application Support/cs_tools/.cs_tools/lib/python3.9/site-packages/cs_tools/cli/tools/searchable/app.py:462 in       │
                                            │ metadata                                                                                                                                           │
                                            │                                                                                                                                                    │
                                            │   459 │   │                                                                                                                                        │
                                            │   460 │   │   # WRITE ALL THE COMBINED DATA TO THE TARGET SYNCER                                                                                   │
                                            │   461 │   │   for model in models.METADATA_MODELS:                                                                                                 │
                                            │ ❱ 462 │   │   │   for rows in temp_sync.read_stream(filename=model.__tablename__, batch=1_000_                                                     │
                                            │   463 │   │   │   │   syncer.dump(model.__tablename__, data=[model.validated_init(**row).model                                                     │
/Library/Application Support/cs_tools/.cs_tools/lib/python3.9/site-packages/cs_tools/sync/csv/syncer.py:115 in read_stream    │
                                            │                                                                                                                                                    │
                                            │   112 │   │   with path.open(mode="r", newline="", encoding="utf-8") as f:                                                                         │
                                            │   113 │   │   │   reader = csv.DictReader(f, **self.dialect_and_format_parameters())                                                               │
                                            │   114 │   │   │                                                                                                                                    │
                                            │ ❱ 115 │   │   │   for rows in utils.batched(reader, n=batch):                                                                                      │
                                            │   116 │   │   │   │   yield self.maybe_replace_empty_with_null(rows)                                                                               │
                                            │   117 │                                                                                                                                            │
                                            │   118 │   # MANDATORY PROTOCOL MEMBERS                                                                                                             
/Library/Application Support/cs_tools/.cs_tools/lib/python3.9/site-packages/cs_tools/utils.py:37 in batched                   │
                                            │                                                                                                                                                    │
                                            │    34 │                                                                                                                                            │
                                            │    35 │   iterable = iter(iterable)                                                                                                                │
                                            │    36 │                                                                                                                                            │
                                            │ ❱  37 │   while batch := tuple(it.islice(iterable, n)):                                                                                            │
                                            │    38 │   │   yield batch
boonhapus commented 3 months ago

We synced via email and I was able to determine a few NUL byte characters were getting written. I've added some cleaning prior to saving data.