hellonarrativ / spectrify

Export Redshift data and convert to Parquet for use with Redshift Spectrum or other data warehouses.
https://aws.amazon.com/blogs/big-data/narrativ-is-helping-producers-monetize-their-digital-content-with-amazon-redshift/
MIT License
116 stars 25 forks source link

Error when running convert: TypeError: unhashable type: 'list' #60

Open unlessdotcom opened 4 years ago

unlessdotcom commented 4 years ago

The export command ran fine, it created the zipped files in S3. Next I'm running the convert command, but it exists with this stacktrace:

Traceback (most recent call last):
  File "/usr/local/bin/spectrify", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/spectrify/main.py", line 61, in convert
    sa_table = SqlAlchemySchemaReader(engine).get_table_schema(table)
  File "/usr/local/lib/python3.7/site-packages/spectrify/utils/schema.py", line 36, in get_table_schema
    schema=schema_name
  File "<string>", line 2, in __new__
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/deprecations.py", line 128, in warned
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/schema.py", line 506, in __new__
    metadata._remove_table(name, schema)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
    compat.reraise(exc_type, exc_value, exc_tb)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 153, in reraise
    raise value
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/schema.py", line 501, in __new__
    table._init(name, metadata, *args, **kw)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/schema.py", line 595, in _init
    resolve_fks=resolve_fks,
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/schema.py", line 636, in _autoload
    _extend_on=_extend_on,
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 2163, in run_callable
    return conn.run_callable(callable_, *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1615, in run_callable
    return callable_(self, *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 460, in reflecttable
    table, include_columns, exclude_columns, resolve_fks, **opts
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/reflection.py", line 642, in reflecttable
    table_name, schema, **table.dialect_kwargs
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/reflection.py", line 369, in get_columns
    self.bind, table_name, schema, info_cache=self.info_cache, **kw
  File "<string>", line 2, in get_columns
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/reflection.py", line 50, in cache
    ret = info_cache.get(key)
TypeError: unhashable type: 'list'

What does this mean?

marcelpanse commented 4 years ago

This is the table:

create table factstable
(
    visitid   varchar(36),
    timestamp timestamp,
    facts     varchar(65535)
);
marcelpanse commented 4 years ago

Nevermind, just read that Amazon announced support exactly for this build into Redshift: https://aws.amazon.com/blogs/aws/new-for-amazon-redshift-data-lake-export-and-federated-queries/ I just tried it and I can export queries directly into S3 formatted as Parquet files.

c-nichols commented 4 years ago

Thanks for the bug report! So they finally did it :D that's clearly now the best way to perform export from Redshift to Parquet. Probably time for a rethink of if/how Spectrify provides value in light of this functionality!

patilarpith commented 4 years ago

Is there a recommended tool to do periodic archiving of redshift data to Parquet in S3 and purge the archived data in redshift?