I was using sqlite-utils to create a DB from a CSV and it turns out the CSV contains a NUL byte.
When the processing reaches the line that contains the NUL an exception is raised.
I'm wondering if there is something that can be done in sqlite-utils to say "skip lines with encoding errors" or some such. I think it isn't super straightforward though as the exception comes from inside the csv module that does all the parsing.
$ sqlite-utils insert --csv kaggle.db kaggle KernelVersions.csv
[------------------------------------] 0%
[#####################---------------] 60% 00:04:24Traceback (most recent call last):
File "/home/foobar/miniconda/envs/meta-kaggle/bin/sqlite-utils", line 10, in <module>
sys.exit(cli())
File "/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/click/core.py", line 1128, in __call__
return self.main(*args, **kwargs)
File "/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/click/core.py", line 1659, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/sqlite_utils/cli.py", line 1223, in insert
insert_upsert_implementation(
File "/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/sqlite_utils/cli.py", line 1085, in insert_upsert_implementation
db[table].insert_all(
File "/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/sqlite_utils/db.py", line 3198, in insert_all
chunk = list(chunk)
File "/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/sqlite_utils/db.py", line 3742, in fix_square_braces
for record in records:
File "/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/sqlite_utils/cli.py", line 1071, in <genexpr>
docs = (decode_base64_values(doc) for doc in docs)
File "/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/sqlite_utils/cli.py", line 1068, in <genexpr>
docs = (verify_is_dict(doc) for doc in docs)
File "/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/sqlite_utils/cli.py", line 1003, in <genexpr>
docs = (dict(zip(headers, row)) for row in reader)
_csv.Error: line contains NUL
I was using sqlite-utils to create a DB from a CSV and it turns out the CSV contains a NUL byte.
When the processing reaches the line that contains the NUL an exception is raised.
I'm wondering if there is something that can be done in
sqlite-utils
to say "skip lines with encoding errors" or some such. I think it isn't super straightforward though as the exception comes from inside thecsv
module that does all the parsing.Concretely the file is the
KernelVersions.csv
from https://www.kaggle.com/datasets/kaggle/meta-kaggleThis is the command and output: