Closed EwoutH closed 2 weeks ago
Okay, it finally starts to look like something.
time_bin
argument to link_to_pandas
Added the optional argument time_bin
argument to link_to_pandas
. Default behavior is now restored, and it's a bit faster since it only need to iterate over a lower amount of slices instead of over all individual timesteps.
Previous strategy, with the timestep
as in #123 and #128, it looked like this:
With this PR it looks like this:
Note that:
traffic_volume
looks like it's higher. This could be a behavior change, since total_travel_time
is now per bin instead of cumulative, which seems more logical when going for a binned approach.The test fails because we now set "area"
as the index name (from the inputted area_names
). We can either change that or update the tests, since I find "area"
quite an logical index.
@toruseo This is now ready for an initial review, before I optimize further.
Thanks for your hard work.
This time_bin
approach is very cool! Let's go this way.
The test fails because we now set "area" as the index name (from the inputted area_names). We can either change that or update the tests, since I find "area" quite an logical index.
For this issue, I want to ensure the backward compatibility. Maybe the index name for non time_bin mode can be just an integer (i.e., drop line 1519).
Another related issue is that the index for area_to_pandas
is different from the others. This might be confusing. Is it possible to make these (time_bin, area)
indexing a optional mode to be enabled by optional argument? Edit: At least, the internal variable Analyzer.df_area
should have a consistent data structure. I think something like below would be suitable
s.df_area = pd.DataFrame(result)
if index_by_area:
if time_bin is not None:
return s.df_area.set_index(["time_bin", "area"])
else:
return s.df_area.set_index(["area"])
Similar indexing mode could be added to link_to_pandas
as well.
As a minor improvement, it is better to accept any numbers (including float) as time_bin argument as it is in seconds. Just doing time_bin=int(time_bin)
in the beginning will be sufficient.
Awesome, I will implement your suggestions later today.
Implemented and tested!
I don't really get the test error. It seems a value diff this time?
This seems to be a real issue.
This means that your method says the traffic volume was 9310, whereas the actual number of all vehicles was 6880. Maybe double-counting or something.
Pls verify your method very carefully. Traffic volume and other stats in time_bin mode need to be verified as well. And can you include your test code (based on a small, tractable scenario like one in test_area_stats()
) as well?
A huge advantage of calculating traffic_volume
based on Link data, is that you can turn off vehicle data logging completely and still get those 6 metrics. That can save an order of magnitude in memory usage in both running the simulation and saving data from it afterwards.
I will try to get the count deviation bug fixed this week.
@toruseo I fixed the remaining issues (see 2dca14f for some details), all tests now pass!
can you add test for time_bin mode as I instructed?
Validity of Analyzer must be very carefully checked, because if its output is corrupted, entire results of the research will be seriously damaged.
Of course, I understand the importance of tests. I just wanted to check first if you were happy about the implementation before moving forward.
I fixed a nasty bug in which the last time bin was excluded sometimes in f543c34 and added tests in ced235e.
sorry, I missed your last message. Let me check
strange. There may be some bugs. I used the following code for testing in a tractable small scale scenario.
from uxsim import *
W = World(
name="",
deltan=5,
tmax=2000,
print_mode=1, save_mode=1, show_mode=1,
random_seed=0
)
W.addNode("orig", 0, 0)
W.addNode("mid1", 1, 1)
W.addNode("mid2", 2, 2)
W.addNode("dest", 3, 3)
link1 = W.addLink("link1", "orig", "mid1", length=1000, free_flow_speed=20, jam_density=0.2)
link2 = W.addLink("link2", "mid1", "mid2", length=1000, free_flow_speed=20, jam_density=0.2, capacity_out=0.4)
link3 = W.addLink("link3", "mid2", "dest", length=1000, free_flow_speed=20, jam_density=0.2)
t1 = 400
t2 = 800
W.adddemand("orig", "dest", 100, t1, 0.8)
W.adddemand("orig", "dest", t1, t2, 0.4)
W.adddemand("orig", "dest", t2, 1000, 0.1)
W.exec_simulation()
W.analyzer.print_simple_stats()
df = W.analyzer.link_to_pandas(time_bin=500)
display(df)
print(sum(df["traffic_volume"][df["link"]=="link1"]), sum(df["traffic_volume"][df["link"]=="link2"]), sum(df["traffic_volume"][df["link"]=="link3"]))
df2 = W.analyzer.area_to_pandas([["orig","mid1","mid2","dest"]])
display(df2)
print(df2["traffic_volume"])
The total number of vehicles is 420, so print(sum(df["traffic_volume"][df["link"]=="link1"]), sum(df["traffic_volume"][df["link"]=="link2"]), sum(df["traffic_volume"][df["link"]=="link3"]))
should print 420 420 420
, and print(df2["traffic_volume"])
should print 420
.
But I got the following results:
Can you check?
Edit: BTW, in the main
branch version, W.analyzer.area_to_pandas([["orig","mid1","mid2","dest"]])
return the following dataframe. Thus, this function is very broken.
Thanks for getting back.
Unfortunately, I don’t have time to work on this in the near future. If you are able, feel free to continue work on this PR.
My gut tells me it might be either A) some time indexing bug, like not including the first or last time step, B) something to do with when the vehicles get counted on a link, or C) something to do with how the areas are split and links the edges behaving weird,
okay, I leave this in pending status
This is related to https://github.com/toruseo/UXsim/issues/143#issuecomment-2473026855 But this is too good to close, so I keep it for future updates.
I don't see me working further on it, but I have successfully used this code in my thesis. So feel free to continue it.
The current error may be small, but I value perfection for this kind of stats computation. I also dont see me working on this, so let me close this one.
This PR introduces time-binned analysis capabilities to the
link_to_pandas
andarea_to_pandas
methods in the Analyzer class, while also addressing an issue with traffic volume calculation. These changes allow for more flexible, granular, and accurate analysis of traffic data over time.Changes
Added
time_bin
argument tolink_to_pandas
:time_bin
is None, the method behaves as before.time_bin
is specified, data is aggregated into time bins.Updated
area_to_pandas
to work with time-binned data:link_to_pandas
whentime_bin
is specified.Fixed traffic volume calculation in
area_to_pandas
:Data validation:
traffic_volume
now represents the volume per bin instead of cumulative.total_travel_time
is calculated per bin, which is more logical for a binned approach.Notes
Part of #120.
TODO
link_to_pandas