matsim-org / matsim-libs

Multi-Agent Transport Simulation
www.matsim.org
488 stars 449 forks source link

Capped Road Pricing via a TollFactor - thread safety questions #1898

Open mfitz opened 2 years ago

mfitz commented 2 years ago

Hi

Following this discussion, we have created an implementation of TollFactor that tracks each agent's progress towards a daily toll cap. Once an agent reaches the cap, we return a toll factor of 0 so that they pay no more tolls in the current iteration.

We initially tested this mechanism using very small and simple datasets with a simple and minimal MATSim config, and it seemed to do exactly what we want.

We have now progressed to using this TollFactor with some of our real models, and we're seeing some unexpected results. Although the toll cap greatly reduces the overall amount of toll revenue, as we would expect, we find that a significant minority of agents, in some cases 30 - 35%, have paid well over the cap value in tolls, often exceeding the cap by hundreds or even thousands.

It feels to me like a race condition between threads (for reasons I will explain below), but I don't have a good enough understanding of the MATSim/Hermes threading or event handling model to know exactly how this is happening or how to fix it. I will start logging out timestamps and thread IDs in an attempt to piece things together, but I wondered if anybody could immediately confirm my race condition theory if I describe my TollFactor class.

Description of our TollFactor class

Initially I was attempting to track tolls paid by agents by updating a map in the TollFactor whenever getTollFactor() is called. My assumption was that this method is called when and only when a toll is about to be paid by an agent. This assumption proved incorrect; the method seemed to be called in many cases without a toll being paid subsequently by the agent.

From stepping through the MATSim code I found one such example when an agent moves between links inside a cordon (we were using MATSim 12 and cordon based road pricing schemes at the time). My getTollFactor() method would be called in response to an agent entering a tolled link and would return a toll factor value, then later on in the call stack there was a check in RoadPricingTollCalculator to determine whether or not the agent was moving from an untolled link to a tolled link. If this was not the case, no PersonMoneyEvent was fired and the agent (correctly) paid no toll. I thus had a mismatch between the (correct) end-of-iteration toll statistics reported by RoadPricingControlerListener, and the toll stats in my own TollFactor, for example 28.0 (correct) versus 406 (incorrect):

2021-11-27T04:31:06,714  INFO ControlerListenerManagerImpl:167 calling notifyIterationEnds on org.matsim.contrib.roadpricing.RoadPricingControlerListener​
2021-11-27T04:31:06,715  INFO RoadPricingControlerListener:80 The sum of all paid tolls : 28.0 monetary units.​
2021-11-27T04:31:06,715  INFO RoadPricingControlerListener:81 The number of people who paid toll : 3​
2021-11-27T04:31:06,715  INFO RoadPricingControlerListener:82 The average paid trip length : 4666.666666666667 m.​
2021-11-27T04:31:06,715  INFO ControlerListenerManagerImpl:167 calling notifyIterationEnds on com.arup.cm.roadpricing.CappedTollFactor​
2021-11-27T04:31:06,715  INFO CappedTollFactor:99 Toll contribution records at end of iteration 1: {person_2=140.0, person_0=140.0, person_1=126.0}

For this reason, I made my CappedTollFactor a PersonMoneyEventHandler; it now updates its own toll record whenever it receives a PersonMoneyEvent in handleEvent(). My reasoning was that these events are guaranteed to fire only when an agent actually pays a toll. NB I would have liked to inject RoadPricingTollCalculator in to my TollFactor and delegate to it, but the method I need from it, getAgentToll() is not public.

My CappedTollFactor now reports the same end-of-iteration "sum of all paid tolls" value as RoadPricingControlerListener, so I'm satisfied it doesn't miss any PersonMoneyEvents. What I'm unsure about is the chronology of when it sees these events.

The parts that query and update the toll record look like this:

    @Override
    public void handleEvent(PersonMoneyEvent event) {
        if (event.getPurpose().equals("toll")) {
            double tollPaid = Math.abs(event.getAmount());
            if (tollPaid != 0.0) {
                logger.debug(format("Agent %s paid a %s toll at time %s",
                        event.getPersonId(), tollPaid, event.getTime()));
                updateTollsRecord(event.getPersonId(), tollPaid);
            } else {
                logger.debug(format("Ignoring zero amount toll event for agent %s at time %s",
                        event.getPersonId(), event.getTime()));
            }
        } else {
            logger.debug(format("Ignoring non toll money event of type '%s'", event.getPurpose()));
        }
    }

    @Override
    public double getTollFactor(Id<Person> personId, Id<Vehicle> vehicleId, Id<Link> linkId, double time) {
        double tollAmount = roadPricingScheme.getLinkCostInfo(linkId, time, personId, vehicleId).amount;
        logger.debug(format("Calculating toll factor on default toll of %s on link %s for agent %s in vehicle %s at %s",
                tollAmount, linkId, personId, vehicleId, time));
        double tollsAlreadyPaid = getAgentTollsPaid(personId);
        if (tollsAlreadyPaid >= agentCapValue) {
            logger.debug(format("Agent %s has reached the toll cap of %s; toll factor on toll of %s (link %s, time %s)" +
                    " is thus 0.00", personId, agentCapValue, tollAmount, linkId, time));
            return 0.00;
        }

        double newTotalTolls = tollsAlreadyPaid + tollAmount;
        if (newTotalTolls <= agentCapValue) {
            logger.debug(format("Agent %s is inside the toll cap of %s - toll factor on toll of %s (link %s, time %s)" +
                    " is thus 1.00", personId, agentCapValue, tollAmount, linkId, time));
            return 1.00;
        } else {
            double amountUntilCap = agentCapValue - tollsAlreadyPaid;
            double tollFactor = amountUntilCap / tollAmount;
            logger.debug(format("A toll of %s would move agent %s beyond the cap of %s (from %s to %s), toll factor " +
                    "is thus %s", tollAmount, personId, agentCapValue, tollsAlreadyPaid, newTotalTolls, tollFactor));
            return tollFactor;
        }
    }

I made no attempt to make this code thread safe. I am now wondering if the thread that ultimately handles PersonMoneyEvents and updates the toll record is different from the thread that is responding to LinkEnterEvents (which ultimately leads to a call to getTollFactor on my CappedTollFactor). In our MATSim config we have:

  <module name="controler">
    <param name="createGraphs" value="true"/>
    <param name="outputDirectory" value="./output-scenario2-capped"/>
    <param name="firstIteration" value="0"/>
    <param name="lastIteration" value="200"/>
    <param name="eventsFileFormat" value="xml"/>
    <param name="writeEventsInterval" value="50"/>
    <param name="writePlansInterval" value="50"/>
    <param name="mobsim" value="hermes"/>
    <param name="snapshotFormat" value=""/>
    <param name="overwriteFiles" value="deleteDirectoryIfExists"/>
  </module>

  <module name="parallelEventHandling">
    <param name="eventsQueueSize" value="270000000"/>
    <param name="estimatedNumberOfEvents" value="null"/>
    <param name="numberOfThreads" value="32"/>
    <param name="oneThreadPerHandler" value="true"/>
  </module>

Is this causing a race condition between two or more threads? In fact, thinking about this some more, it may not even require a race between threads - it could just be that the LinkEnterEvents all fire and are processed together, whereas the PersonMoneyEvents that result from some of these movements along tolled links are not handled until much later, meaning that the toll record is not up-to-date when we query it in getTollFactor.

If so, this would seem to require a fundamentally different approach to supporting capped road pricing - do you have any ideas on what that approach would look like?

Janekdererste commented 2 years ago

it could just be that the LinkEnterEvents all fire and are processed together, whereas the PersonMoneyEvents that result from some of these movements along tolled links are not handled until much later

It might be different in Hermes, but I think the event handling in Hermes and Qsim works the same way. Events are handled in separate threads which fetch events which need processing form a queue. Your config indicates, that you have 32 threads fetching from that queue.

Hermes is issuing LinkEnterEvents which are all added to the queue. The event worker threads then start to fetch those events from the queue and pass it to the different event handlers. One of them would be the one, which handles LinkEnterEvents for the purpose of toll calcuation. This event handler will then issue PersonMoneyEvents to the EventsManager which will add those events to the events queue as well. From there the event threads will eventually fetch the PersonMoneyEvent and pass it to the respective event handlers.

Event handlers, in the default implementation of EventsManager are assigned to a fixed thread. This means, a single event handler gets events passed in the same order as they are added to the events queue. However, this is not the case across different event handlers, since those might be on different threads.

I guess the way you are doing this will not work, since the toll factor is fetched when a LinkEnterEvent is handled, but the factor which is used is updated later (too late) once the PersonMoneyEvent comes in.

mfitz commented 2 years ago

Thanks @Janekdererste

That confirms my suspicions - ultimately the problem here is that the processing of the PersonMoneyEvent that I'm using to update the toll record in my TollFactor class is asynchronous. I need it to be synchronous so that the TollFactor is always working with up-to-date data.

From looking through the MATSim code, I can see that RoadPricingTollCalculator implements LinkEnterEventHandler, so this suggests that it is being called from the same thread that will eventually move the agent on to the next link in their trip after they pay the toll on the current link. If I am understanding correctly, that would mean that RoadPricingTollCalculator, which maintains a record of the tolls paid by each agent, updates that record synchronously.

As I mentioned above, my initial thought was to have my TollFactor class query RoadPricingTollCalculator, rather than maintain its own (asynchronously updated, it turns out...) toll record, but the default access scope of getAgentToll() meant that I could only do that if I added my own code to the org.matsim.contrib.roadpricing package, which felt like the wrong thing to do. I think the solution might be to go back to this approach and have my TollFactor delegate management of the toll record to RoadPricingTollCalculator. I'm going to give that a try and see if it solves my problem.

Janekdererste commented 2 years ago

You should be fine by querying the RoadPricingTollCalculator. As said each event handler processes events in the same order as they were issued to the events queue. The RoadPricingTollCalculator is not necessarily called on the same thread as the one which was issuing the LinkEnterEvent. In your case this should be okay anyway.

If you don't want to work within the matsim repository directly, it is possible to name a package in your downstream project org.matsim.contrib.roadpricing. From within this package you should be able to access package private methods and classes.

mfitz commented 2 years ago

So my plan seemed like a good one until I realised that RoadPricingTollCalculator.getAgentToll() no longer exists in the latest version of MATSim. So even if I move my CappedTollFactor class into the same org.matsim.contrib.roadpricing package, I can't use it. I'm now a little stumped.

From reading these comments, it looks like the toll record as stored internally by RoadPricingTollCalculator is considered deprecated because of the PersonMoneyEvents that are fired whenever an agent traverses a tolled link, but as described above, I can't rely on those events for my use case. Is there some other way to avoid asynchronicity when tracking agents paying tolls @Janekdererste @kainagel ? How can I make sure my CappedTollFactor is using the most up-to-date toll record when it checks to see if an agent has exceeded the toll cap?

kainagel commented 2 years ago

My intuition would be to program it yourself. The way to do this is always the same with MATSim contribs:

  1. Copy RoadPricingModule into your own space (and preferably rename it so we do not get confused). You can then probably make it much shorter, since the contrib version reacts to config settings, and you will not need that any more if you know what you are doing.
  2. Check what you need to replace. In your case probably RoadPricingControlerListener and then RoadPricingTollCalculator. Also copy those into your own space, rename, and modify.

You should end up with replacements for these three classes in your own space. But not more. Otherwise, please let us know (also see below).

A bit more philosophical: Evidently, one should reuse as much code as possible, since otherwise we will never arrive at certifiable simulation results. On the other hand, it is nearly impossible to make everything replaceable by injection. In consequence, if requirements go significantly beyond what the existing code is able to process, one may have to replace the injected module. I would say that you own code should be shorter than 1000 lines of code, otherwise something is wrong, either with our or with your design.

If, at some point, you notice that you need access to something that is not accessible (non-public, or final), please let us know. Normally, we will not make it public or non-final, but hopefully either point to ways to proceed anyways, or find some other solution to address the problem. In particular the latter sometimes takes some time; then please have patience with us.