tulsawebdevs / django-multi-gtfs

Django app to import and export General Transit Feed Specification (GTFS)
http://tulsawebdevs.org/
Apache License 2.0
51 stars 33 forks source link

Error importing transfers.txt with invalid stop IDs #53

Open araichev opened 8 years ago

araichev commented 8 years ago

Greetings. Using mult-gtfs v0.4.3, i tried to import this GTFS feed for Fort Lauderdale: http://transitfeeds.com/p/broward-county-transit/49/latest/download and got the following error.

Traceback (most recent call last):
  File "/home/araichev/.virtualenvs/bingo/lib/python3.4/site-packages/django/db/backends/util.py", line 53, in execute
    return self.cursor.execute(sql, params)
psycopg2.IntegrityError: null value in column "point" violates not-null constraint
DETAIL:  Failing row contains (249219, 54, 169, , , , null, , , null, , null, , {}).

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "manage.py", line 10, in <module>
    execute_from_command_line(sys.argv)
  File "/home/araichev/.virtualenvs/bingo/lib/python3.4/site-packages/django/core/management/__init__.py", line 399, in execute_from_command_line
    utility.execute()
  File "/home/araichev/.virtualenvs/bingo/lib/python3.4/site-packages/django/core/management/__init__.py", line 392, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/home/araichev/.virtualenvs/bingo/lib/python3.4/site-packages/django/core/management/base.py", line 242, in run_from_argv
    self.execute(*args, **options.__dict__)
  File "/home/araichev/.virtualenvs/bingo/lib/python3.4/site-packages/django/core/management/base.py", line 285, in execute
    output = self.handle(*args, **options)
  File "/home/araichev/.virtualenvs/bingo/lib/python3.4/site-packages/multigtfs/management/commands/importgtfs.py", line 73, in handle
    feed.import_gtfs(gtfs_feed)
  File "/home/araichev/.virtualenvs/bingo/lib/python3.4/site-packages/multigtfs/models/feed.py", line 109, in import_gtfs
    count = klass.import_txt(table, self) or 0
  File "/home/araichev/.virtualenvs/bingo/lib/python3.4/site-packages/multigtfs/models/base.py", line 249, in import_txt
    fields[name_map[column_name]] = val_map[column_name](value)
  File "/home/araichev/.virtualenvs/bingo/lib/python3.4/site-packages/multigtfs/models/base.py", line 155, in get_instance
    **kwargs).id
  File "/home/araichev/.virtualenvs/bingo/lib/python3.4/site-packages/django/db/models/manager.py", line 157, in create
    return self.get_queryset().create(**kwargs)
  File "/home/araichev/.virtualenvs/bingo/lib/python3.4/site-packages/django/db/models/query.py", line 322, in create
    obj.save(force_insert=True, using=self.db)
  File "/home/araichev/.virtualenvs/bingo/lib/python3.4/site-packages/django/db/models/base.py", line 545, in save
    force_update=force_update, update_fields=update_fields)
  File "/home/araichev/.virtualenvs/bingo/lib/python3.4/site-packages/django/db/models/base.py", line 573, in save_base
    updated = self._save_table(raw, cls, force_insert, force_update, using, update_fields)
  File "/home/araichev/.virtualenvs/bingo/lib/python3.4/site-packages/django/db/models/base.py", line 654, in _save_table
    result = self._do_insert(cls._base_manager, using, fields, update_pk, raw)
  File "/home/araichev/.virtualenvs/bingo/lib/python3.4/site-packages/django/db/models/base.py", line 687, in _do_insert
    using=using, raw=raw)
  File "/home/araichev/.virtualenvs/bingo/lib/python3.4/site-packages/django/db/models/manager.py", line 232, in _insert
    return insert_query(self.model, objs, fields, **kwargs)
  File "/home/araichev/.virtualenvs/bingo/lib/python3.4/site-packages/django/db/models/query.py", line 1514, in insert_query
    return query.get_compiler(using=using).execute_sql(return_id)
  File "/home/araichev/.virtualenvs/bingo/lib/python3.4/site-packages/django/db/models/sql/compiler.py", line 903, in execute_sql
    cursor.execute(sql, params)
  File "/home/araichev/.virtualenvs/bingo/lib/python3.4/site-packages/django/db/backends/util.py", line 53, in execute
    return self.cursor.execute(sql, params)
  File "/home/araichev/.virtualenvs/bingo/lib/python3.4/site-packages/django/db/utils.py", line 99, in __exit__
    six.reraise(dj_exc_type, dj_exc_value, traceback)
  File "/home/araichev/.virtualenvs/bingo/lib/python3.4/site-packages/django/utils/six.py", line 658, in reraise
    raise value.with_traceback(tb)
  File "/home/araichev/.virtualenvs/bingo/lib/python3.4/site-packages/django/db/backends/util.py", line 53, in execute
    return self.cursor.execute(sql, params)
django.db.utils.IntegrityError: null value in column "point" violates not-null constraint
DETAIL:  Failing row contains (249219, 54, 169, , , , null, , , null, , null, , {}).

After some debugging, i discovered that the feed is faulty: transfers.txt contains stop IDs not present in stops.txt. Fair enough, but it appears that multigtfs is trying to create the stops in transfers.txt that it can't find in stops.txt, and that causes the error; see https://github.com/tulsawebdevs/django-multi-gtfs/blob/v1.0.0/multigtfs/models/base.py#L153.

That seems like a bug to me, because transfers.txt will never contain enough data, e.g. stop geography, to properly create stop objects. I'd favor a warning, such as, "Encountered stop ID in transfers.txt, which is not present in stops.txt. Skipping this stop." What do you think?

Now, i didn't test the feed import using multigtfs v1.0.0, but i do see that the offending code block linked above is the same as in version 0.4.3.

Thanks for your attention.

jwhitlock commented 8 years ago

I was unable to duplicate this bug. I downloaded the file at the link, and imported it under two different configurations, and it imported correctly both times.

My feed download is a zip file, 4,511,933 bytes, with an md5sum of 2dec5d2e3e58f90344b221a6d1ba0f3c

I tried it with my development configuration of Python 2.7.12, multigtfs master branch, and these packages:

Django==1.9.8
django-extensions==1.6.7
django-nose==1.4.4
jsonfield==1.0.3
multigtfs==master
nose==1.3.7
psycopg2==2.6.2
six==1.10.0

I was also unable to reproduce it with Python 3.4.3 and these packages

Django==1.6.11
django-extensions==1.6.7
django-nose==1.3
jsonfield==1.0.3
multigtfs==0.4.3
nose==1.3.7
psycopg2==2.6.2
six==1.10.0
South==1.0.2
jwhitlock commented 8 years ago

@araichev can you post your copy of the feed somewhere for download? 5MB might be be allowed as a GitHub attachment.

araichev commented 8 years ago

Sorry, my bad! Turns out the error occurs with a corrupted version of the feed i linked to, one in which the stop IDs in transfers.txt have their leading zeros removed.
Github won't let me attach that file for some reason, but you can create it easily by editing transfers.txt.

The corrupted feed actually fails Google's feed validator, so i don't expect multigtfs to perform well on it, so you can ignore my bug report. Sorry again.

jwhitlock commented 8 years ago

No problem. If it had been a bug, it may not have been back ported to 0.4.3. You may want to start the work of updating to Django 1.8 LTS, so that a multigtfs upgrade would have been possible for a bug fix.

I'm going to leave the bug open with the new title. This does feel like something that could have a more friendly error message, and a simple test to reproduce.