FEN results - Githubissues

MargaridaSilva commented 3 years ago

Hi,

I ran FEN-FD in Matthew Effect but the results I got seem to be inconsistent with the ones in the paper. I don't believe to have changed anything significant in the model. Do you have any idea why this could be happening? Models

Thanks in advance!

matthieu637 commented 3 years ago

Hello, I think it is because I uploaded the version where the communications are limited to neighbors (which is more comparable with SOTO). Depending on your hypotheses, you may allow communications between any agent. In the paper, we reported the best version (the one without limitation).

Please have a look at 51d949b3acc43ab293d15893102659c828081cb4, now:

FEN-FD.py is FEN with the gossip algorithm without limited communication to neighbors
FEN-FD-NEIGH.py is FEN with the gossip algorithm with limited communications to neighbors (the one you used in your previous figure)

MargaridaSilva commented 3 years ago

Hi, Many thanks for the new files and explanations, I really appreciate them! Will rerun the results, but it would make sense for the model to perform better once all agents communicate with each other.

Meanwhile, I am also running SOTO and Independent on Traffic Light Control (SUMO) and for SOTO got a global waiting time in the order of 10^5 (worse than independent) instead of ~10^2 as described in the paper.

This is the expression I am using to collect the waiting time per intersection metric SUMO environment at every step: np.array(self.last_measure2).reshape(self.nD, -1).sum(axis=-1)

score

matthieu637 commented 3 years ago

I don't think the number of seeds is the problem here. The curve we have for this domain is:

We used SUMO v1.16.

For the metric, the data inside learning.data should be used instead of self.last_measure2. As noticed by @umersheikh846, the reward has the form of "OLD_MESURE - NEW_MESURE", self.last_measure2 only represents NEW_MESURE.

MargaridaSilva commented 3 years ago

Many thanks for the reply. The reason why I was using self.last_measure2 as a metric, instead of the reward, is that given an agent i throughout timesteps I believe the global waiting time for such agent would be (0 - m_1) + (m1 - m2) + (m2 - m3) + .... + (m{T-1} - m_T) = - m_T \approx 0. Indeed, when I first ran models on SOTO without changing this metric I got a straight line roughly on 0. I then started saving m_t such that the global waiting time would be a sum over all metrics of all agents and steps in an episode. Does this make sense?

MargaridaSilva commented 3 years ago

Just to clarify, the results you present in the plot are the summation of rewards obtained from all agents throughout all of the timesteps of each episode, correct?

I am using your work for my master's dissertation, so I appreciate any timely help so as to be able to properly include it.

matthieu637 commented 3 years ago

Sorry for the late reply it was a busy week, I double checked the script, for the previous plot we relied on the learning.data file.

Indeed you are right, we shouldn't rely on this metric and that should explain why we have so much variance in our plot for this environment. Your curve is actually much more stable. Could you share the same curve with the CV for the y-axis to verify that SOTO achieve fairer solutions ?

Thank you very much for noticing this issue. We are going to upload a corrected version.

MargaridaSilva commented 3 years ago

Hi,

I apologise for the late reply, the delivery of my dissertation is very close.

These are the test results I got for the independent baseline and SOTO-GGF-FD averaged over 50 episodes on total weighting time, cv, min and max waiting time, recording only the waiting time per timestep. SOTO achieves a fairer solution, while not significantly increasing the total waiting time when compared to independent.

1-lin

jiawweiL commented 3 years ago

Hi,

I apologise for the late reply, the delivery of my dissertation is very close.

These are the test results I got for the independent baseline and SOTO-GGF-FD averaged over 50 episodes on total weighting time, cv, min and max waiting time, recording only the waiting time per timestep. SOTO achieves a fairer solution, while not significantly increasing the total waiting time when compared to independent.

Hello, I have some questions about dfrl. In the sumo scenario, I use the unmodified code and get a reward of 0. I see that you have different evaluation methods, which may be what I need. I'll ask you in detail. Where did you modify the code.Thank you. I wish you every success in your work.

matthieu637 commented 3 years ago

Thanks to @umersheikh846, this issue should be solved with #4. Thanks again @MargaridaSilva for reporting.

matthieu637 / DFRL

FEN results #2