gronlund / cvrdata

Extract data from danish CVR registry from Danish Business Authority
MIT License
7 stars 2 forks source link

Specified key was too long; max key length is 767 bytes' #1

Closed naarkhoo closed 5 years ago

naarkhoo commented 5 years ago

As I am trying to "python -m cvrparser dbsetup -t" I get the following error,

---- CREATE TABLES ----
Create Tables
Creating Table Aarsbeskaeftigelse
Creating Table AarsbeskaeftigelseInterval
Creating Table AdresseDawa
Creating Table Adresseupdates
Creating Table Attributter
Creating Table Branche
Traceback (most recent call last):
  File "/anaconda3/envs/Virk/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1193, in _execute_context
    context)
  File "/anaconda3/envs/Virk/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 509, in do_execute
    cursor.execute(statement, parameters)
  File "/anaconda3/envs/Virk/lib/python3.6/site-packages/MySQLdb/cursors.py", line 250, in execute
    self.errorhandler(self, exc, value)
  File "/anaconda3/envs/Virk/lib/python3.6/site-packages/MySQLdb/connections.py", line 50, in defaulterrorhandler
    raise errorvalue
  File "/anaconda3/envs/Virk/lib/python3.6/site-packages/MySQLdb/cursors.py", line 247, in execute
    res = self._query(query)
  File "/anaconda3/envs/Virk/lib/python3.6/site-packages/MySQLdb/cursors.py", line 412, in _query
    rowcount = self._do_query(q)
  File "/anaconda3/envs/Virk/lib/python3.6/site-packages/MySQLdb/cursors.py", line 375, in _do_query
    db.query(q)
  File "/anaconda3/envs/Virk/lib/python3.6/site-packages/MySQLdb/connections.py", line 276, in query
    _mysql.connection.query(self, query)
_mysql_exceptions.OperationalError: (1071, 'Specified key was too long; max key length is 767 bytes')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/anaconda3/envs/Virk/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/anaconda3/envs/Virk/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/anaconda3/envs/Virk/lib/python3.6/site-packages/cvrparser/__main__.py", line 125, in <module>
    getattr(Commands, args.pop('command'))(**args)
  File "/anaconda3/envs/Virk/lib/python3.6/site-packages/cvrparser/__main__.py", line 18, in dbsetup
    crdb.create_tables()
  File "/anaconda3/envs/Virk/lib/python3.6/site-packages/cvrparser/cvr_makedb.py", line 22, in create_tables
    self.alchemy_model.create_tables()
  File "/anaconda3/envs/Virk/lib/python3.6/site-packages/cvrparser/alchemy_tables.py", line 406, in create_tables
    x.__table__.create(engine)
  File "/anaconda3/envs/Virk/lib/python3.6/site-packages/sqlalchemy/sql/schema.py", line 779, in create
    checkfirst=checkfirst)
  File "/anaconda3/envs/Virk/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1940, in _run_visitor
    conn._run_visitor(visitorcallable, element, **kwargs)
  File "/anaconda3/envs/Virk/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1549, in _run_visitor
    **kwargs).traverse_single(element)
  File "/anaconda3/envs/Virk/lib/python3.6/site-packages/sqlalchemy/sql/visitors.py", line 121, in traverse_single
    return meth(obj, **kw)
  File "/anaconda3/envs/Virk/lib/python3.6/site-packages/sqlalchemy/sql/ddl.py", line 796, in visit_table
    self.traverse_single(index)
  File "/anaconda3/envs/Virk/lib/python3.6/site-packages/sqlalchemy/sql/visitors.py", line 121, in traverse_single
    return meth(obj, **kw)
  File "/anaconda3/envs/Virk/lib/python3.6/site-packages/sqlalchemy/sql/ddl.py", line 823, in visit_index
    self.connection.execute(CreateIndex(index))
  File "/anaconda3/envs/Virk/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 948, in execute
    return meth(self, multiparams, params)
  File "/anaconda3/envs/Virk/lib/python3.6/site-packages/sqlalchemy/sql/ddl.py", line 68, in _execute_on_connection
    return connection._execute_ddl(self, multiparams, params)
  File "/anaconda3/envs/Virk/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1009, in _execute_ddl
    compiled
  File "/anaconda3/envs/Virk/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1200, in _execute_context
    context)
  File "/anaconda3/envs/Virk/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1413, in _handle_dbapi_exception
    exc_info
  File "/anaconda3/envs/Virk/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 265, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=cause)
  File "/anaconda3/envs/Virk/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 248, in reraise
    raise value.with_traceback(tb)
  File "/anaconda3/envs/Virk/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1193, in _execute_context
    context)
  File "/anaconda3/envs/Virk/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 509, in do_execute
    cursor.execute(statement, parameters)
  File "/anaconda3/envs/Virk/lib/python3.6/site-packages/MySQLdb/cursors.py", line 250, in execute
    self.errorhandler(self, exc, value)
  File "/anaconda3/envs/Virk/lib/python3.6/site-packages/MySQLdb/connections.py", line 50, in defaulterrorhandler
    raise errorvalue
  File "/anaconda3/envs/Virk/lib/python3.6/site-packages/MySQLdb/cursors.py", line 247, in execute
    res = self._query(query)
  File "/anaconda3/envs/Virk/lib/python3.6/site-packages/MySQLdb/cursors.py", line 412, in _query
    rowcount = self._do_query(q)
  File "/anaconda3/envs/Virk/lib/python3.6/site-packages/MySQLdb/cursors.py", line 375, in _do_query
    db.query(q)
  File "/anaconda3/envs/Virk/lib/python3.6/site-packages/MySQLdb/connections.py", line 276, in query
    _mysql.connection.query(self, query)
sqlalchemy.exc.OperationalError: (_mysql_exceptions.OperationalError) (1071, 'Specified key was too long; max key length is 767 bytes') [SQL: 'CREATE UNIQUE INDEX branche_index ON `Branche` (branchekode, branchetekst)'] (Background on this error at: http://sqlalche.me/e/e3q8)

I thought the problem could the charset - so I changed the config.ini but also within the mysql environment using ALTER DATABASE virkdb CHARACTER SET utf8 COLLATE utf8_general_ci;.

As a background information, I could setup the regnskab package (download the two large tables).

naarkhoo commented 5 years ago

I ended up changing Integer to SmallInteger and in some other cases, using Column(VARCHAR(255), nullable=False, unique=True) instead of Column(String(255, 'utf8mb4_bin'), nullable=False, unique=True)

Branche

gronlund commented 5 years ago

I do not have this error. I am guessing it is some sort of strict check in mysql since i can only get an error if i actually insert a key that is to long. I really need to make better unique string key solution i am not to happy with but have not had the time.

gronlund commented 5 years ago

The easiest thing is to reduce the string length of branchetekst to less than 188. It seems the strings i currently have in my local copy of the data which is somewhat outdated is 115. I guess the best solution is to actually change it to text so no surprises come when new industri long industri descriptions suddenly appear and then handle the unique constraint in my program. I am not going to change that anytime soon. It is cause by the fact that industri code keys are not unique and have changed meaning over time, so the really best thing is to annotate the industri code with a reference to the specification it came from see https://www.dst.dk/da/Statistik/dokumentation/nomenklaturer/dansk-branchekode-db07#