Can not reproduce Runner or ScoreL2RPN2022's results when used with the MultiFolderWithCache feature

Environment

Grid2op version: 1.9.8
System: tested on Windows 10 and Ubuntu 20.04.6 LTS

Bug description

I want to evaluate the DoNothingAgent on some scenarios thanks to the Runner class or the ScoreL2RPN2022 score. I set environment and agent seeds to 0, but my results are not reproducible.

Some leads:

I found out that it was linked to the fact that I used the MultiFolderWithCache feature when I define my environment: when I remove it, it works fine.
If I don't redefine the runner, I can reproduce my results (I mean inside the same script), c.f. my comment in my code.
I tried to deactivate the opponent with kwargs_no_opp = grid2op.Opponent.get_kwargs_no_opponent(). I obtain indeed better survival times but it is still not reproducible.

How to reproduce

Lauch the following code. You might not see difference in results at first try, lauch it again then.

import grid2op
from grid2op.Runner import Runner
from grid2op.Agent import DoNothingAgent
from grid2op.Chronics import MultifolderWithCache
from lightsim2grid import LightSimBackend
import numpy as np
import re
import os

env_name = "l2rpn_idf_2023"
env = grid2op.make(env_name, backend=LightSimBackend(),
                   chronics_class=MultifolderWithCache, # remove these lines and it works well
                   )
# Loading a dozen of scenarios
env.chronics_handler.real_data.set_filter(lambda x: re.match(r".*2035-06-11.*$", x) is not None) # remove these lines and it works well
env.chronics_handler.real_data.reset() # remove these lines and it works well

n_episode = 10
# We set all seeds to 0 to avoid problems that would be linked to an incorrect order of scenarios
env_seeds = [0 for _ in range(n_episode)]
agent_seeds = [0 for _ in range(n_episode)]
# If you put the "runner = ..." line here instead of inside the for bloc, results are reproductible -> ie redefining the runner changes something

# I evaluate the DoNothingAgent twive
for i in range(2):
    runner = Runner(**env.get_params_for_runner(), agentClass=DoNothingAgent)

    res = runner.run(nb_episode=n_episode, pbar=True,
                    env_seeds=env_seeds,
                    agent_seeds=agent_seeds,
                    episode_id=np.arange(n_episode), 
                    )

    ts_survived = [el[3] for el in res]
    print("mean:", np.mean(ts_survived))

Current output

mean: n1
mean: n2

I obtain n1 ≠ n2 but I should have n1 = n2.

rte-france / Grid2Op

Can not reproduce Runner or ScoreL2RPN2022's results when used with the MultiFolderWithCache feature #616

Environment

Bug description

How to reproduce

Current output