jarondl / pygtfs

A python (2/3) library for GTFS
MIT License
63 stars 44 forks source link

Memory optimization? #85

Open FabienD74 opened 5 months ago

FabienD74 commented 5 months ago

Hi, In the code of loader.py we have

    gtfs_tables = {}
    for gtfs_class in gtfs_all:
...
            gtfs_tables[gtfs_class] = fd.read_table(gtfs_filename,
                                                    set(c.name for c in gtfs_class.__table__.columns) - {'feed_id'})

then a few lines later:

    for gtfs_class in gtfs_all:
        if gtfs_class not in gtfs_tables:
            continue
        gtfs_table = gtfs_tables[gtfs_class]

To me (correct me if i'm wrong) : 1) The whole unzipped content is loaded in memory in the first piece of code!!!... That's HUGE.... it should be avoided 2) on the second piece of code, the last statement duplicate the table content of the current "gtfs_class" ( stops, stop_times, shapes ...) WHY ????? Can't we use gtfs_tables[gtfs_slass] directly ? Whithout duplicating the content into table "gtfs_table" ?

Thx

Regards Fabien