osPlanning / omx

Open Matrix (OMX)
https://github.com/osPlanning/omx/wiki
Apache License 2.0
49 stars 18 forks source link

Need for list format #35

Open harisbal opened 5 years ago

harisbal commented 5 years ago

Hello everyone and congratulations for this great project. Standardising formats will definitely help everyone in the transport industry.

I just wanted to raise a concern regarding the selected approach. Although the transport industry has been using and storing OD matrices in the "matrix format" (e.g. rows and columns), I believe that this is not the most efficient approach. From my perspective as well as from quite a few other data analysts and programmers (e.g. https://vita.had.co.nz/papers/tidy-data.pdf ) storing data in a "list" or "database format" is more efficient. Following this format all ODs could be stored in a single file and the user will be able to make selections based on simple and standardised queries. For instance:

Origin_Zone, Destination_Zone, Trip_Purpose, Time_Period, Trips
A, B, HBW, AM, 10
Z, X, HBO, IP, 12
...

I would really like to know the views of the development team regarding this comment.

Kind Regards

-Haris

pedrocamargo commented 5 years ago

Hey Haris,
I'd beg to differ. It is true that in a world of agent-based modelling, our demand matrices are incredibly sparse and what we actually have are trip tables (as you have suggested). However, many of the inputs and outputs in our models are still dense matrices (e.g. skim matrices), and storing them in anything else other than a proper matrix would not be efficient. I would also point to the computational efficiency of contiguous space in memory (or in disk) for storing arrays (of known size), which I guess is part of the attractiveness of this format. Maybe what we need is to have examples of smart ways of converting OMX matrices to data tables and vice-versa (I could help with the Python version)?

harisbal commented 5 years ago

Hi Pedro! Allowing for both worlds would be probably the best approach since as you already pointed out different formats are more appropriate for different scenarios. It's true though that converting matrices to lists is a rather simple task. The dev team could count me in to contribute towards this approach. Cheers!