jarondl / pygtfs

A python (2/3) library for GTFS
MIT License
63 stars 44 forks source link

Unique constraint failed when trying to import the latest data from BART #61

Closed aetaric closed 3 years ago

aetaric commented 3 years ago

Hello! While trying to import data from BART into Home Assistant using https://www.home-assistant.io/integrations/gtfs/ I ran into a SQLAlchemy error. I have confirmed that this error occurs with the latest pygtfs even when home assistant's code isn't in use:

gtfs2db append bart.zip bart.sqlite
Loading GTFS data for <class 'pygtfs.gtfs_entities.Agency'>:
Loading GTFS data for <class 'pygtfs.gtfs_entities.Stop'>:
Loading GTFS data for <class 'pygtfs.gtfs_entities.Transfer'>:
Loading GTFS data for <class 'pygtfs.gtfs_entities.Route'>:
Loading GTFS data for <class 'pygtfs.gtfs_entities.Fare'>:
Loading GTFS data for <class 'pygtfs.gtfs_entities.FareRule'>:
Loading GTFS data for <class 'pygtfs.gtfs_entities.ShapePoint'>:
Loading GTFS data for <class 'pygtfs.gtfs_entities.Service'>:
Loading GTFS data for <class 'pygtfs.gtfs_entities.ServiceException'>:
Loading GTFS data for <class 'pygtfs.gtfs_entities.Trip'>:
Loading GTFS data for <class 'pygtfs.gtfs_entities.Frequency'>:
Loading GTFS data for <class 'pygtfs.gtfs_entities.StopTime'>:
Loading GTFS data for <class 'pygtfs.gtfs_entities.FeedInfo'>:
Loading GTFS data for <class 'pygtfs.gtfs_entities.Translation'>:
1 record read for <class 'pygtfs.gtfs_entities.Agency'>.
182 records read for <class 'pygtfs.gtfs_entities.Stop'>.
99 records read for <class 'pygtfs.gtfs_entities.Transfer'>.
14 records read for <class 'pygtfs.gtfs_entities.Route'>.
2500 records read for <class 'pygtfs.gtfs_entities.Fare'>.
2500 records read for <class 'pygtfs.gtfs_entities.FareRule'>.
Traceback (most recent call last):
  File "/opt/homebrew/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1750, in _execute_context
    self.dialect.do_executemany(
  File "/opt/homebrew/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 714, in do_executemany
    cursor.executemany(statement, parameters)
sqlite3.IntegrityError: UNIQUE constraint failed: transfers.feed_id, transfers.from_stop_id, transfers.to_stop_id

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/homebrew/bin/gtfs2db", line 8, in <module>
    sys.exit(main())
  File "/opt/homebrew/lib/python3.9/site-packages/pygtfs/gtfs2db.py", line 54, in main
    append_feed(schedule, args['<feed_file>'],
  File "/opt/homebrew/lib/python3.9/site-packages/pygtfs/loader.py", line 93, in append_feed
    schedule.session.flush()
  File "/opt/homebrew/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 3298, in flush
    self._flush(objects)
  File "/opt/homebrew/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 3438, in _flush
    transaction.rollback(_capture_exception=True)
  File "/opt/homebrew/lib/python3.9/site-packages/sqlalchemy/util/langhelpers.py", line 70, in __exit__
    compat.raise_(
  File "/opt/homebrew/lib/python3.9/site-packages/sqlalchemy/util/compat.py", line 207, in raise_
    raise exception
  File "/opt/homebrew/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 3398, in _flush
    flush_context.execute()
  File "/opt/homebrew/lib/python3.9/site-packages/sqlalchemy/orm/unitofwork.py", line 456, in execute
    rec.execute(self)
  File "/opt/homebrew/lib/python3.9/site-packages/sqlalchemy/orm/unitofwork.py", line 630, in execute
    util.preloaded.orm_persistence.save_obj(
  File "/opt/homebrew/lib/python3.9/site-packages/sqlalchemy/orm/persistence.py", line 242, in save_obj
    _emit_insert_statements(
  File "/opt/homebrew/lib/python3.9/site-packages/sqlalchemy/orm/persistence.py", line 1094, in _emit_insert_statements
    c = connection._execute_20(
  File "/opt/homebrew/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1582, in _execute_20
    return meth(self, args_10style, kwargs_10style, execution_options)
  File "/opt/homebrew/lib/python3.9/site-packages/sqlalchemy/sql/elements.py", line 323, in _execute_on_connection
    return connection._execute_clauseelement(
  File "/opt/homebrew/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1451, in _execute_clauseelement
    ret = self._execute_context(
  File "/opt/homebrew/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1813, in _execute_context
    self._handle_dbapi_exception(
  File "/opt/homebrew/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1994, in _handle_dbapi_exception
    util.raise_(
  File "/opt/homebrew/lib/python3.9/site-packages/sqlalchemy/util/compat.py", line 207, in raise_
    raise exception
  File "/opt/homebrew/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1750, in _execute_context
    self.dialect.do_executemany(
  File "/opt/homebrew/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 714, in do_executemany
    cursor.executemany(statement, parameters)
sqlalchemy.exc.IntegrityError: (sqlite3.IntegrityError) UNIQUE constraint failed: transfers.feed_id, transfers.from_stop_id, transfers.to_stop_id
[SQL: INSERT INTO transfers (feed_id, from_stop_id, to_stop_id, transfer_type, min_transfer_time) VALUES (?, ?, ?, ?, ?)]
[parameters: ((1, 'MCAR', 'MCAR', 2, '20'), (1, 'MCAR', 'MCAR', 2, '240'), (1, 'MCAR', 'MCAR', 2, '240'), (1, 'MCAR', 'MCAR', 2, '20'), (1, 'MCAR', 'MCAR', 2, '20'), (1, 'MCAR', 'MCAR', 2, '20'), (1, 'MCAR', 'MCAR', 2, '20'), (1, 'MCAR', 'MCAR', 2, '20')  ... displaying 10 of 99 total bound parameter sets ...  (1, 'MONT', 'MONT', 2, '20'), (1, 'MONT', 'MONT', 2, '20'))]
(Background on this error at: http://sqlalche.me/e/14/gkpj)

relevant package version info:

pip3 list | egrep -i "pygtfs|sqlalchemy"
pygtfs     0.1.6
SQLAlchemy 1.4.20
mazzy89 commented 3 years ago

I have the exact same issue but for a different dataset

❯ gtfs2db append PID_GTFS.zip pygtfs.sqlite                                                                                                                                                                                                            13:46:20
Loading GTFS data for <class 'pygtfs.gtfs_entities.Agency'>:
Loading GTFS data for <class 'pygtfs.gtfs_entities.Stop'>:
Loading GTFS data for <class 'pygtfs.gtfs_entities.Transfer'>:
Loading GTFS data for <class 'pygtfs.gtfs_entities.Route'>:
Loading GTFS data for <class 'pygtfs.gtfs_entities.Fare'>:
Loading GTFS data for <class 'pygtfs.gtfs_entities.FareRule'>:
Loading GTFS data for <class 'pygtfs.gtfs_entities.ShapePoint'>:
Loading GTFS data for <class 'pygtfs.gtfs_entities.Service'>:
Loading GTFS data for <class 'pygtfs.gtfs_entities.ServiceException'>:
Loading GTFS data for <class 'pygtfs.gtfs_entities.Trip'>:
Loading GTFS data for <class 'pygtfs.gtfs_entities.Frequency'>:
Loading GTFS data for <class 'pygtfs.gtfs_entities.StopTime'>:
Loading GTFS data for <class 'pygtfs.gtfs_entities.FeedInfo'>:
Loading GTFS data for <class 'pygtfs.gtfs_entities.Translation'>:
1 record read for <class 'pygtfs.gtfs_entities.Agency'>.
.13318 records read for <class 'pygtfs.gtfs_entities.Stop'>.
4164 records read for <class 'pygtfs.gtfs_entities.Transfer'>.
660 records read for <class 'pygtfs.gtfs_entities.Route'>.
22 records read for <class 'pygtfs.gtfs_entities.Fare'>.
37 records read for <class 'pygtfs.gtfs_entities.FareRule'>.
Traceback (most recent call last):
  File "/opt/homebrew/lib/python3.9/site-packages/SQLAlchemy-1.4.22-py3.9-macosx-11-arm64.egg/sqlalchemy/engine/base.py", line 1751, in _execute_context
    self.dialect.do_executemany(
  File "/opt/homebrew/lib/python3.9/site-packages/SQLAlchemy-1.4.22-py3.9-macosx-11-arm64.egg/sqlalchemy/engine/default.py", line 714, in do_executemany
    cursor.executemany(statement, parameters)
sqlite3.IntegrityError: UNIQUE constraint failed: transfers.feed_id, transfers.from_stop_id, transfers.to_stop_id

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/homebrew/bin/gtfs2db", line 33, in <module>
    sys.exit(load_entry_point('pygtfs==0.1.6.dev31+gb427d2f', 'console_scripts', 'gtfs2db')())
  File "/opt/homebrew/lib/python3.9/site-packages/pygtfs-0.1.6.dev31+gb427d2f-py3.9.egg/pygtfs/gtfs2db.py", line 54, in main
    append_feed(schedule, args['<feed_file>'],
  File "/opt/homebrew/lib/python3.9/site-packages/pygtfs-0.1.6.dev31+gb427d2f-py3.9.egg/pygtfs/loader.py", line 93, in append_feed
    schedule.session.flush()
  File "/opt/homebrew/lib/python3.9/site-packages/SQLAlchemy-1.4.22-py3.9-macosx-11-arm64.egg/sqlalchemy/orm/session.py", line 3298, in flush
    self._flush(objects)
  File "/opt/homebrew/lib/python3.9/site-packages/SQLAlchemy-1.4.22-py3.9-macosx-11-arm64.egg/sqlalchemy/orm/session.py", line 3438, in _flush
    transaction.rollback(_capture_exception=True)
  File "/opt/homebrew/lib/python3.9/site-packages/SQLAlchemy-1.4.22-py3.9-macosx-11-arm64.egg/sqlalchemy/util/langhelpers.py", line 70, in __exit__
    compat.raise_(
  File "/opt/homebrew/lib/python3.9/site-packages/SQLAlchemy-1.4.22-py3.9-macosx-11-arm64.egg/sqlalchemy/util/compat.py", line 207, in raise_
    raise exception
  File "/opt/homebrew/lib/python3.9/site-packages/SQLAlchemy-1.4.22-py3.9-macosx-11-arm64.egg/sqlalchemy/orm/session.py", line 3398, in _flush
    flush_context.execute()
  File "/opt/homebrew/lib/python3.9/site-packages/SQLAlchemy-1.4.22-py3.9-macosx-11-arm64.egg/sqlalchemy/orm/unitofwork.py", line 456, in execute
    rec.execute(self)
  File "/opt/homebrew/lib/python3.9/site-packages/SQLAlchemy-1.4.22-py3.9-macosx-11-arm64.egg/sqlalchemy/orm/unitofwork.py", line 630, in execute
    util.preloaded.orm_persistence.save_obj(
  File "/opt/homebrew/lib/python3.9/site-packages/SQLAlchemy-1.4.22-py3.9-macosx-11-arm64.egg/sqlalchemy/orm/persistence.py", line 242, in save_obj
    _emit_insert_statements(
  File "/opt/homebrew/lib/python3.9/site-packages/SQLAlchemy-1.4.22-py3.9-macosx-11-arm64.egg/sqlalchemy/orm/persistence.py", line 1094, in _emit_insert_statements
    c = connection._execute_20(
  File "/opt/homebrew/lib/python3.9/site-packages/SQLAlchemy-1.4.22-py3.9-macosx-11-arm64.egg/sqlalchemy/engine/base.py", line 1583, in _execute_20
    return meth(self, args_10style, kwargs_10style, execution_options)
  File "/opt/homebrew/lib/python3.9/site-packages/SQLAlchemy-1.4.22-py3.9-macosx-11-arm64.egg/sqlalchemy/sql/elements.py", line 323, in _execute_on_connection
    return connection._execute_clauseelement(
  File "/opt/homebrew/lib/python3.9/site-packages/SQLAlchemy-1.4.22-py3.9-macosx-11-arm64.egg/sqlalchemy/engine/base.py", line 1452, in _execute_clauseelement
    ret = self._execute_context(
  File "/opt/homebrew/lib/python3.9/site-packages/SQLAlchemy-1.4.22-py3.9-macosx-11-arm64.egg/sqlalchemy/engine/base.py", line 1814, in _execute_context
    self._handle_dbapi_exception(
  File "/opt/homebrew/lib/python3.9/site-packages/SQLAlchemy-1.4.22-py3.9-macosx-11-arm64.egg/sqlalchemy/engine/base.py", line 1995, in _handle_dbapi_exception
    util.raise_(
  File "/opt/homebrew/lib/python3.9/site-packages/SQLAlchemy-1.4.22-py3.9-macosx-11-arm64.egg/sqlalchemy/util/compat.py", line 207, in raise_
    raise exception
  File "/opt/homebrew/lib/python3.9/site-packages/SQLAlchemy-1.4.22-py3.9-macosx-11-arm64.egg/sqlalchemy/engine/base.py", line 1751, in _execute_context
    self.dialect.do_executemany(
  File "/opt/homebrew/lib/python3.9/site-packages/SQLAlchemy-1.4.22-py3.9-macosx-11-arm64.egg/sqlalchemy/engine/default.py", line 714, in do_executemany
    cursor.executemany(statement, parameters)
sqlalchemy.exc.IntegrityError: (sqlite3.IntegrityError) UNIQUE constraint failed: transfers.feed_id, transfers.from_stop_id, transfers.to_stop_id
[SQL: INSERT INTO transfers (feed_id, from_stop_id, to_stop_id, transfer_type, min_transfer_time) VALUES (?, ?, ?, ?, ?)]
[parameters: ((1, 'U1072Z101P', 'U1072Z121P', 2, '240'), (1, 'U1072Z121P', 'U1072Z101P', 2, '240'), (1, 'U1072Z101P', 'U1072Z122P', 2, '240'), (1, 'U1072Z122P', 'U1072Z101P', 2, '240'), (1, 'U1072Z102P', 'U1072Z121P', 2, '240'), (1, 'U1072Z121P', 'U1072Z102P', 2, '240'), (1, 'U1072Z102P', 'U1072Z122P', 2, '240'), (1, 'U1072Z122P', 'U1072Z102P', 2, '240')  ... displaying 10 of 4164 total bound parameter sets ...  (1, 'U2020Z2', 'U2020Z2', 1, None), (1, 'U2020Z2', 'U2020Z2', 1, None))]
(Background on this error at: https://sqlalche.me/e/14/gkpj)
mazzy89 commented 3 years ago

@aetaric considering the Python package in here is not really under heavy development, we would need to find ourselves a solution.

mazzy89 commented 3 years ago

so from a quick troubleshooting the issue seems to be how the Transfer object is initialized

class Transfer(Base):
    __tablename__ = 'transfers'
    _plural_name_ = 'transfers'
    feed_id = Column(Integer, ForeignKey('_feed.feed_id'), primary_key=True)
    from_stop_id = Column(Unicode, primary_key=True)
    to_stop_id = Column(Unicode, primary_key=True)
    transfer_type = Column(Integer, nullable=True)  # required; allowed empty
    min_transfer_time = Column(Integer, nullable=True)

    __table_args__ = (
        ForeignKeyConstraint([feed_id, to_stop_id], [Stop.feed_id, Stop.stop_id]),
        ForeignKeyConstraint([feed_id, from_stop_id], [Stop.feed_id, Stop.stop_id]),
    )

    stop_to = relationship(Stop, backref="transfers_to",
            primaryjoin=and_(Stop.stop_id==foreign(to_stop_id),
                             Stop.feed_id==feed_id))
    stop_from = relationship(Stop, backref="transfers_from",
            primaryjoin=and_(Stop.stop_id==foreign(from_stop_id),
                             Stop.feed_id==feed_id))

    _validate_transfer_type = _validate_int_choice([None, 0, 1, 2, 3],
                                                   'transfer_type')

    def __repr__(self):
        return "<Transfer %s-%s>" % (self.from_stop_id, self.to_stop_id)

If you take a look at here feed_id, from_stop_id, to_stop_id are all primary key but this should not be true.

mazzy89 commented 3 years ago

@jarondl could you please shade some lights on it? according to the history, those are set as primary keys since day one.

mazzy89 commented 3 years ago

For my use case I do not need transfers so I've removed the txt file and after a while home assistant loaded correctly the gtfs zip and thr time is displayed correctly.

Bertware commented 3 years ago

Hi @mazzy89,

This issue is caused by pygtfs using (from_stop_id, to_stop_id) as a primary key, while there are extensions to include specific trip_to_trip transfers where the same from_stop_idand to_stop_id are used. I came actually here looking for an issue like this, as we had the same problem with our new GTFS feeds in Sweden.

A couple of days these trip-to-trip and route-to-route transfers became part of the official spec (see https://github.com/google/transit/pull/284), so pygtfs has to be updated in order to support these new columns. I'll see if I can create a PR.

Bertware commented 3 years ago

63 would fix two issues when importing BART data, but #62 will have to be solved as well

jarondl commented 3 years ago

As @mazzy89 and @Bertware explained above, the problem seems to be the standard changing assumptions once again, with the stop_id pair not being unique anymore.

Bertware commented 3 years ago

Note that this stop_id pair was not specified to be unique to begin with, but one could assume this due to the nature of the data.

https://github.com/google/transit/pull/278 addresses this issue by defining the primary and foreign fields, meaning that issues like these shouldn't occur again. It was in this fact this issue that sparked the adoption of trip to trip and route to route transfers, as adding these would have caused backwards compatibility issues as soons as the unique IDs would have been defined.

Anyway, I believe this issue can be closed, as the transfer problem is fixed. These types of transfers are now supported by the PR merge.

aetaric commented 3 years ago

bash-5.1# gtfs2db overwrite bart.zip bart.sqlite Loading GTFS data for <class 'pygtfs.gtfs_entities.Agency'>: Loading GTFS data for <class 'pygtfs.gtfs_entities.Stop'>: Loading GTFS data for <class 'pygtfs.gtfs_entities.Transfer'>: Loading GTFS data for <class 'pygtfs.gtfs_entities.Route'>: Loading GTFS data for <class 'pygtfs.gtfs_entities.Fare'>: Loading GTFS data for <class 'pygtfs.gtfs_entities.FareRule'>: Loading GTFS data for <class 'pygtfs.gtfs_entities.ShapePoint'>: Loading GTFS data for <class 'pygtfs.gtfs_entities.Service'>: Loading GTFS data for <class 'pygtfs.gtfs_entities.ServiceException'>: Loading GTFS data for <class 'pygtfs.gtfs_entities.Trip'>: Loading GTFS data for <class 'pygtfs.gtfs_entities.Frequency'>: Loading GTFS data for <class 'pygtfs.gtfs_entities.StopTime'>: Loading GTFS data for <class 'pygtfs.gtfs_entities.FeedInfo'>: Loading GTFS data for <class 'pygtfs.gtfs_entities.Translation'>: 1 record read for <class 'pygtfs.gtfs_entities.Agency'>. 99 records read for <class 'pygtfs.gtfs_entities.Stop'>. 40 records read for <class 'pygtfs.gtfs_entities.Transfer'>. 12 records read for <class 'pygtfs.gtfs_entities.Route'>. 11 records read for <class 'pygtfs.gtfs_entities.Service'>. 24 records read for <class 'pygtfs.gtfs_entities.ServiceException'>. 3020 records read for <class 'pygtfs.gtfs_entities.Trip'>. ....47178 records read for <class 'pygtfs.gtfs_entities.StopTime'>. Complete.

Good stuff @Bertware! Gonna close this myself as it's very clearly fixed.