EwoutH commented 2 months ago

Let's say I have a model that simulates n hours and I want to collect data every hour. For example, I would like get something like area_to_pandas every hour (in simulation time), for the past hour.

Currently, if I call area_to_pandas(), it calculates everything from the start of the simulation towards the current state.

So I'm curious what's the best way do do such a thing. I already have mechanisms to call it every hour (or n seconds), but I just need a way to only collect data from the last hour and not the full duration.

EwoutH commented 2 months ago

Since detailed network animations can be created, this information is already saved somewhere, right?

toruseo commented 2 months ago

@EwoutH thanks for the issues and PRs. I will give you detailed feedback after I get back to work.

Quick comments: The data is mostly saved in Vehicle.log_x and other Vehicle.log_*s. Plz backtrack the code for the details. Your proposed data access (during simulation and limited areas) must be possible. But this is too specific, and at this moment I dont have plan to implemet by myself. Can you try by yourself? They can be implemented by modigying the current Analyzer.*_to_pandas(). If you implement, plz do so by creating new functions in Analyzer to ensure the backward compatibility.

EwoutH commented 2 months ago

Thanks for getting back. Since most data is in the vehicle, that first needs to be translated to the links. Are there existing functions/data I can hook into? Then from the links we need to aggerate per area.

So I think:

Translate vehicle data to link data. Use existing functions if possible.
Create a function to "reset" link data.
Request link data every hour, reset afterwards.
Aggregate link data to area data. Use existing functions if possible.

Another approach could be to use the existing, area_to_pandas, but apply it on a copy of the link data, which is reset every hour.

How would you approach it? And are there specific functions I could use?

EwoutH commented 2 months ago

I'm going to do a deep dive into this tomorrow. Goal is to create a convenient area_to_pandas_in_timespawn function that takes a start and (optionally) end time and reports statistics over that period of time.

Having https://github.com/toruseo/UXsim/pull/119 merged might help a lot with keeping that fast.

toruseo commented 2 months ago

@EwoutH thanks for your work!

As you might noticed (and I forgot to mention), some link-level data is stored in the following variables.

        s.cum_arrival = []
        s.cum_departure = []
        s.traveltime_actual = []

In https://github.com/toruseo/UXsim/pull/123, it looks like you modified how these variables computed. Unfortunately it broke some important logic of the simulator. This must be the reason of failure of these tests . Is it possible to implement your functions without altering the internal logic? It is preferable (and hopefully easy) if you can implement your functions by only adding new methods to Analyzer class.

EwoutH commented 2 months ago

https://github.com/toruseo/UXsim/pull/123 was a rough draft that's nowhere close to ideal, it just (looks like it) works good enough for my research.

Ideally, all that data generated by UXsim should be able to be indexed over time. I see two approaches:

Attach timestamps to each data point. This gives the most resolution and customizable aggregation after a model run, but also is likely to add the most overhead and memory costs.
Collect data in bins. Define some bin width (number of seconds) beforehand, and store all data in dictionary (or similar) structured with the bin start time as index.

If not all data would be collected, you could get away with some duplication of data on time intervals. Just readout the variable the model uses and save them in a separate variable used for data analysis.

123 was kind of a hybrid solution, in which I stored some timestamps and then aggerated them over some bin width.

It is preferable (and hopefully easy) if you can implement your functions by only adding new methods to Analyzer class.

Unfortunately, it's impossible to do this without making some (small) changes to uxsim.py, since currently the time data just isn't there.

What might be interesting, in the Mesa library we're currently working on a similar challenge, over how to collect data from agent-based models. We're working on a more complicated solution in which certain variables are tracked for state changes. Might not be the best fit for UXsim, but there might be some ideas in that as wel.

https://github.com/projectmesa/mesa/discussions/2281 and slightly broader:
https://github.com/projectmesa/mesa/discussions/1944

CC @quaquel

toruseo commented 2 months ago

cum_arrival and the other lists are indexed by time step number. So if you want to get one on t second, you can get it by something like link.cum_arrival[int(t/W.DELTAT)] where W.DELTAT is the time step width in second.

Unfortunately, I feel sorry to say that changing the internal logic for this specific purpose (as made by https://github.com/toruseo/UXsim/pull/123) is not acceptable, as it critically break backward compatibility, and even putting that aside, we would still need to thoroughly review all the code.

toruseo commented 2 months ago

Maybe I can add more comprehensive and user-friendly getter functions

toruseo commented 2 months ago

In fact, there are getter-like functions already: Link.arrival_count(t), Link.departure_count(t)

But I believe directly accessing the lists cum_arrival by using slicing cum_arrival[int(t_start/W.DELTAT):int(t_end/W.DELTAT)] would be more convenient and "Pythonic" if you need a sequence of data.

EwoutH commented 2 months ago

Ah thanks, I indeed didn't know all these things. That helps a lot (and I wish I knew them before starting the implementation in #123).

I will try to come up with an implementation using slicing, if I can find the time.

But having #119 and #121 merged helps already, that makes the PR diffs smaller, thanks!

EwoutH commented 2 months ago

Thanks for all your work yesterday and today!

Would you like to implement this functionality as an example of how users can use the user_function?

toruseo commented 2 months ago

If you can implement this time-interval-based and https://github.com/toruseo/UXsim/issues/122 's zone-based methods as Analyzer 's new methods without modifying uxisim.py, it would be great as other people can use the functions easily.

Works using user_function will be highly customized and may not be easy to reuse. But perhaps useful to showcase the ability of uxsim.

EwoutH commented 2 months ago

What's the idea behind this check?

https://github.com/toruseo/UXsim/blob/112b425a38cfd244320fce4e9781c2f1f80b0d82/uxsim/analyzer.py#L1221

Finding this took over an hour of debugging why my dataframes kept being empty...

EwoutH commented 2 months ago

Right, it's probably so that you don't calculate it too many times? But you want to do exactly that if you want to compute it multiple times.

EwoutH commented 2 months ago

I thought splitting in bin_width wouldn't be that complicated. Two hours later still don't have a working implementation.

toruseo commented 2 months ago

Right, it's probably so that you don't calculate it too many times? But you want to do exactly that if you want to compute it multiple times.

That's true. The original design intention is that these Analyzer functions only run after the simulation finished, so only 1 computation was sufficient.

I think vehicles_to_pandas is very heavy because Vehicle.log_*s are very large lists. If you want to compute it during the simulation multiple times, you need to improve this function

toruseo commented 2 weeks ago

Closing. See https://github.com/toruseo/UXsim/issues/143#issuecomment-2473026855 for the reason. Feel free to re-open.

toruseo commented 2 weeks ago

On a second thought, I am re-opening as this is related to https://github.com/toruseo/UXsim/pull/130

toruseo commented 2 weeks ago

closing as https://github.com/toruseo/UXsim/pull/130 is closed

EwoutH commented 2 weeks ago

That the PR is closed doesn’t mean the issue has to be closed, this can still be a desirable feature.

toruseo commented 2 weeks ago

That's true.

But for this case, I guess we both understand that it is difficult to implement this feature to satisfy everyone's requirements. So, I think this is better to be defined on the user-side.

The main UXsim module should have functions for general users. Specialized functions for some users should be defined by the users.

toruseo / UXsim

Reading out data in time interval #120

123 was kind of a hybrid solution, in which I stored some timestamps and then aggerated them over some bin width.