roskakori / pimdb

build a database from IMDb datasets
https://pypi.org/project/pimdb/
BSD 3-Clause "New" or "Revised" License
7 stars 1 forks source link

psycopg2.DataError: value too long #13

Closed chassing closed 4 years ago

chassing commented 4 years ago

I'm using postgres 10.2 as database and I'm facing following issue during "transfer":

Traceback (most recent call last):                                                                                                                                                         [14/542]
  File ".../lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1228, in _execute_context
    cursor, statement, parameters, context
  File ".../lib/python3.7/site-packages/sqlalchemy/dialects/postgresql/psycopg2.py", line 857, in do_executemany
    cursor.executemany(statement, parameters)
psycopg2.DataError: value too long for type character varying(51)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File ".../bin/pimdb", line 10, in <module>
    sys.exit(main())
  File ".../lib/python3.7/site-packages/pimdb/command.py", line 237, in main
    sys.exit(exit_code_for())
  File ".../lib/python3.7/site-packages/pimdb/command.py", line 223, in exit_code_for
    command_class(parser, args).run()
  File ".../lib/python3.7/site-packages/pimdb/command.py", line 164, in run
    self._database.build_dataset_table(connection, imdb_dataset_name, self._dataset_folder, log_progress)
  File ".../lib/python3.7/site-packages/pimdb/database.py", line 396, in build_dataset_table
    self._checked_insert(connection, table_to_modify, False)
  File ".../lib/python3.7/site-packages/pimdb/database.py", line 403, in _checked_insert
    connection.execute(table.insert(), self._data_to_insert)
  File ".../lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 984, in execute
    return meth(self, multiparams, params)
  File ".../lib/python3.7/site-packages/sqlalchemy/sql/elements.py", line 293, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File ".../lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1103, in _execute_clauseelement
    distilled_params,
  File ".../lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1288, in _execute_context
    e, statement, parameters, cursor, context
  File ".../lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1482, in _handle_dbapi_exception
    sqlalchemy_exception, with_traceback=exc_info[2], from_=e
  File ".../lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 178, in raise_
    raise exception
  File ".../lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1228, in _execute_context
    cursor, statement, parameters, context
  File ".../lib/python3.7/site-packages/sqlalchemy/dialects/postgresql/psycopg2.py", line 857, in do_executemany
    cursor.executemany(statement, parameters)
sqlalchemy.exc.DataError: (psycopg2.DataError) value too long for type character varying(51)

[SQL: INSERT INTO "NameBasics" (nconst, "primaryName", "birthYear", "deathYear", "primaryProfession", "knownForTitles") VALUES (%(nconst)s, %(primaryName)s, %(birthYear)s, %(deathYear)s, %(primar
yProfession)s, %(knownForTitles)s)]
[parameters: ({'nconst': 'nm0425618', 'primaryName': 'Lawrence Johnson', 'birthYear': None, 'deathYear': None, 'primaryProfession': 'actor', 'knownForTitles': 'tt0010490,tt0012644,tt0007927,tt001
1296'}, {'nconst': 'nm0425619', 'primaryName': 'Lawrence Johnson', 'birthYear': None, 'deathYear': None, 'primaryProfession': 'actor', 'knownForTitles': 'tt0082277'}, {'nconst': 'nm0425620', 'pri
maryName': 'Lawrence Johnson', 'birthYear': None, 'deathYear': None, 'primaryProfession': 'actor', 'knownForTitles': 'tt0131436'}, {'nconst': 'nm0425621', 'primaryName': 'Leanne Johnson', 'birthY
ear': None, 'deathYear': None, 'primaryProfession': 'actress', 'knownForTitles': 'tt0072942'}, {'nconst': 'nm0425622', 'primaryName': 'Lee Johnson', 'birthYear': None, 'deathYear': None, 'primary
Profession': '', 'knownForTitles': 'tt0058896,tt0110633'}, {'nconst': 'nm0425623', 'primaryName': 'Lee Johnson', 'birthYear': None, 'deathYear': None, 'primaryProfession': 'composer,director,edit
or', 'knownForTitles': 'tt0157946,tt11380066,tt11387284,tt11390606'}, {'nconst': 'nm0425624', 'primaryName': 'Lee Johnson', 'birthYear': None, 'deathYear': None, 'primaryProfession': 'composer',
'knownForTitles': 'tt0181447'}, {'nconst': 'nm0425625', 'primaryName': 'Lee Johnson', 'birthYear': None, 'deathYear': None, 'primaryProfession': 'miscellaneous', 'knownForTitles': 'tt0815168,tt03
38221,tt0255202,tt0314123'}  ... displaying 10 of 128 total bound parameter sets ...  {'nconst': 'nm0425750', 'primaryName': 'Mark Johnson', 'birthYear': None, 'deathYear': None, 'primaryProfessi
on': 'camera_department,cinematographer', 'knownForTitles': 'tt0414545,tt0212720,tt6346162,tt1281313'}, {'nconst': 'nm0425753', 'primaryName': 'Mark Benton Johnson', 'birthYear': None, 'deathYear
': None, 'primaryProfession': 'producer,location_management,production_manager', 'knownForTitles': 'tt4824302,tt0257076,tt1282140,tt0311289'})]
(Background on this error at: http://sqlalche.me/e/9h9h)
roskakori commented 4 years ago

@chassing Thanks for pointing this out. For whatever reason Wolfgang Hofer (nm1512084) has 15 "know for" titles. 😑

Version 0.1.1 has this fixed and is up on PyPI.