lexfridman / deeptraffic

DeepTraffic is a deep reinforcement learning competition, part of the MIT Deep Learning series.
https://selfdrivingcars.mit.edu/deeptraffic
MIT License
1.74k stars 260 forks source link

Compatible to OpenAI Gym? #4

Open AI-Guru opened 5 years ago

AI-Guru commented 5 years ago

Namaste!

Great work! I really like it!

Question: Is DeepTraffic compatible to OpenAI Gym? I remember Nvidia writing about it in a 2017 blog-article. https://blogs.nvidia.com/blog/2017/07/07/deeptraffic-how-an-mit-simulation-game-uses-deep-learning-to-reduce-gridlock/

Best, Tristan

lexfridman commented 5 years ago

We did implement OpenAI Gym compatibility, so you could train the agent on your own machine. But never released it, because the challenge was that we want that code to then be submitted and evaluated in an automated way, so that it could be considered for the leaderboard. There's ways to do this and it's something I'm hoping to do in 2019. Help would be appreciated, especially in ideas of how the full pipeline can be set up. Alternatively, we're considered a totally new Deep RL competition that from the beginning is designed to allow for both in-browser and offline training. I see us doing the latter, and continuing to use DeepTraffic as an accessible education tool.

AI-Guru commented 5 years ago

I see! Thanks for the clarification!

Shmuma commented 5 years ago

@AI-Guru JFYI, I've implemented more or less accurate Gym version of deeptraffic environment. Going to opensource it after the competition. Probably it could be merged into gym as well.

AI-Guru commented 5 years ago

Thank you so very much!

jackft commented 5 years ago

@AI-Guru @Shmuma, if either of you have questions regarding implementation details, I can provide some guidance or answers.

Shmuma commented 5 years ago

@jackft I have couple of questions about implementation, it would be nice to have answers without js-code reverse engineering :)

First one: how input to the network is formed? I see input shape of

var num_inputs = (lanesSide * 2 + 1) * (patchesAhead + patchesBehind);
var network_size = num_inputs * temporal_window + num_actions * temporal_window + num_inputs;

But how observations are flattened into the vector and in which order temporal history is appended into final observation is not clear. Ideally, it would be description like this:

  • input offset 0..num_inputs-1: row-wise observations for current timestamp
  • offset num_inputs...num_inputs*2: row-wise observations for previous timestamp
  • etc

Second question: how occupancy is calculated? I guess every cell has height of 10 pixels, but when cell is considered occupied? If single pixel is occupied or if more than half of the cell is overlapped by car?

Third question: car dynamics. How speed is being calculated? Does car changes its effective speed immediately as car in front of it changes lane or it happened smoothly? How acceleration/braking effect speed?

I guess, it could be lots of questions, but I'm trying to build an accurate python version of the environment, which, I believe, could be useful in future launches of competition. On my side, I can make a promise to opensource it for everyone's benefits, now or later :).

jackft commented 5 years ago

@Shmuma First, the input is formed by creating a 1d vector from the 2d occupancy grid by looping through the 2d occupancy grid in row major order, making sure you only loop over the portion of the full map which the car can observe. ConvNetJS handles the temporal window, this merely appends to the input (1) a previous state and (2) a previous 1-hot-vector representation of the action taken in a previous state.

Second, occupancy is calculated by looping through all the cars and adding them to the occupancy grid. We map the car coordinates to cell coordinates and then take the floor of the cell coordinates: the result is a pair of indices.

Third, there are several parts to this question. Longitudinal velocity is calculated by speed_factor * max_speed. When accelerating or decelerating we increment or decrement the speed_factor (it should always be between [0, 1]). When changing lanes, we maintain a target lane. If the car's target lane changes, it moves over gradually. A car can only make decisions every N frames. A lane change takes N frames.

AI-Guru commented 5 years ago

@Shumna, any news? :D

Shmuma commented 5 years ago

@AI-Guru Yep, sorry for delay :)

There is my repo with gym-compatible DeepTraffic environment: https://github.com/Shmuma/deep-traffic-2019

It includes environment with tests, training and playing code and not fully finished pytorch -> js converter of the trained model.

This project is not finished (sadly), I was distracted by other things. The next step should be finishing JS export utility and make sure gym environment has close dynamics to the native code.