Ultra: milestones/tasks

Research: [Alex, Jenish, Christian, Kimia]

try simpler action spaces:
- Lane = 1
- LaneWithContinuousSpeed = 3
- run examples get familiar with agent_spec definitions in simple-form from smarts
- learn how to change DQN policy to work with these 2 controllers
run DQN with discrete action-space [helps debugging the reward/observation adapter]
run continuous baselines (ppo/sac) [ more baselines/ add new baselines]
- get familiar with the policy
- try different policy parameters
reward trimming:
- try different reward scalings in adapters
observation trimming:
- start understanding the social_vehicle encoders [observation/social vehicle representation]
  - try precog, pointnet, and compare them to no-encoder(has only speed/pos/steering)
  - debug them (integrate with simple policies)

Engineering: [Kimia, Christian, Jenish] urgent:

huawei-noah / SMARTS