instadeepai / jumanji

🕹️ A diverse suite of scalable reinforcement learning environments in JAX
https://instadeepai.github.io/jumanji
Apache License 2.0
649 stars 80 forks source link

[question] Theoretical bound for the Connector env #238

Closed ashok-arora closed 5 months ago

ashok-arora commented 7 months ago

Hey, Is there a theoretical bound on the Connector env that the solution will always exist (even if suboptimal) for the agents such that there's no overlapping paths?

clement-bonnet commented 7 months ago

Hi @ashok-arora, For now, Connector comes with two implemented generators: UniformRandomGenerator and RandomWalkGenerator. The former just randomly samples pairs of points (starts and targets) and does not guarantee solvability (that there exists a solution). On the contrary, the latter guarantees a solution by constructing a random walk, with the downside of being slower and generating potentially easier instances. Using the RandomWalkGenerator, there is no known optimal return but you can use the ratio_connections metric which indicates an agent's performance between 0 and 1 (optimal policy). Hope this helps!

ashok-arora commented 7 months ago

Hi @clement-bonnet, Thank you for the quick reply. I found the generators here but wasn't able to find the ratio_connections metric, is it in a seperate place? Also, I was wondering if it makes sense to penalise the agent for the episode if the solvability is not guaranteed?

clement-bonnet commented 7 months ago

ratio_connections is returned as an extras metric inside the timestep object which is the output of the reset or step function. The ratio is computed as part of the environment dynamics in here.

Regarding the reward, you may implement the reward you wish here. The dense reward formulation which is already implemented gives a small penalty per timestep, encouraging fast wiring. Indeed, when the instance is not solvable, the penalty will be given at each timestep until the horizon is reached. Although I don't think this is a problem, it does make more sense to combine this reward with a solvable generator. I would recommend using the solvable generator (RandomWalkGenerator). Hope this answers your question.

ashok-arora commented 7 months ago

Thank you so much for the response Clement. Lastly, was the connector env introduced in the jumanji paper or is there any precursor to it?

sash-a commented 7 months ago

Hey @ashok-arora I implemented this a while ago, but I'm not aware of any previous environment that this is based on. It was just meant to be a very simple PCB routing env

clement-bonnet commented 5 months ago

I'm closing this as it seems to be resolved.