wireservice / agate-sql

agate-sql adds SQL read/write support to agate.
https://agate-sql.readthedocs.io
MIT License
18 stars 15 forks source link

Crashes in make_sql_table() and to_sql() when min_col_len and/or col_len_multiplier are set to None due to lack of validation checks #46

Closed clach04 closed 2 months ago

clach04 commented 2 months ago

Due to bug https://github.com/wireservice/csvkit/issues/1258 None will be set for both min_col_len and col_len_multiplier input parameters into make_sql_table() and to_sql().

This causes issues at:

https://github.com/wireservice/agate-sql/blob/0803b77fdd0be0792de99eaa21effa9786cd22f5/agatesql/table.py#L200

Sample traceback:

(py3venv) C:\code\py\csvkit_agatesql>python bug.py
Traceback (most recent call last):
  File "C:\code\py\csvkit_agatesql\bug.py", line 137, in <module>
    sql_table = agatesql.table.make_sql_table(table, table_name, dialect=dialect, db_schema=db_schema, constraints=not no_constraints,
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\code\py\csvkit_agatesql\agate-sql\agatesql\table.py", line 200, in make_sql_table
    length = table.aggregate(agate.MaxLength(column_name)) * decimal.Decimal(col_len_multiplier)
                                                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: conversion from NoneType to Decimal is not supported

and

https://github.com/wireservice/agate-sql/blob/0803b77fdd0be0792de99eaa21effa9786cd22f5/agatesql/table.py#L210

sample traceback

(py3venv) C:\code\py\csvkit_agatesql>python bug.py
Traceback (most recent call last):
  File "C:\code\py\csvkit_agatesql\bug.py", line 139, in <module>
    sql_table = agatesql.table.make_sql_table(table, table_name, dialect=dialect, db_schema=db_schema, constraints=not no_constraints,
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\code\py\csvkit_agatesql\agate-sql\agatesql\table.py", line 210, in make_sql_table
    sql_type_kwargs['length'] = length if length >= min_col_len else min_col_len
                                          ^^^^^^^^^^^^^^^^^^^^^
TypeError: '>=' not supported between instances of 'decimal.Decimal' and 'NoneType'

This would not be an issue if the caller did not pass in those parameters as the default if omitted is 1.

There are also errors if zero (0) is passed in.

I do not know if this is considered a bug of this library and make_sql_table() or the caller.

I would recommend adding some sanity checks as a safety net, something along the lines of:

if not col_len_multiplier:
    col_len_multiplier = 1
if not min_col_len:
    min_col_len  = 1

which would at least catch None and Zero (but still some potential for shenanigans if caller passes in negative values 😝 )

clach04 commented 2 months ago

I put together a simple test case in the off chance it's helpful. It also fully documents the dependent libraries and versions in case that's relevant.

bug.py_rename_me.txt

Smaller, standalone test case without database driver needed:

import agate
import agatesql

# theses are command line arguments in csvsql (part of csvkit) that default to None
col_len_multiplier = min_col_len = None  # possible bug in csvkit / csvsql
# EDIT ME HERE! Uncomment to see different aspects of the same bug
#col_len_multiplier = min_col_len = 0  # another bug
#col_len_multiplier = min_col_len = 1  # Workaround
#col_len_multiplier , min_col_len = 1, None  # show second part of bug

table_name, overwrite, no_create, create_if_not_exists, insert, prefix, db_schema, no_constraints, unique_constraint, chunk_size, min_col_len, col_len_multiplier = ('test_table', False, False, False, True, [], None, False, [], None, min_col_len, col_len_multiplier)

table = agate.table.Table([('data',), ], column_names=['col1', ])  # original use case was csvkit; agate.Table.from_csv()

dialect = 'ingres'
sql_table = agatesql.table.make_sql_table(table, table_name, dialect=dialect, db_schema=db_schema, constraints=not no_constraints,
                       unique_constraint=unique_constraint, connection=None,
                       min_col_len=min_col_len, col_len_multiplier=col_len_multiplier)
jpmckinney commented 2 months ago

It's a problem in the caller.