lgooo / SUMO-RL-Coverage

We provide an open source software package for AV based simulation and testing running a docker container
0 stars 0 forks source link

Create offline data set #27

Open kihyukh opened 2 years ago

kihyukh commented 2 years ago

Follow the offline data generation method in Precup's paper to create an offline data for SUMO environment.

See page 8 of the paper for the description:

Since there is no standard dataset for offline constrained RL, we collected data using online constrained RL agents (C-DMPO; the constrained variant of Distributional MPO) (Abdolmaleki et al., 2018; Mankowitz et al., 2021). We trained the online C-DMPO with various cost thresholds c saved checkpoints at regular intervals, which constitutes the pool of policy checkpoints. Then, we generated datasets, where each of them consists of β × 100% constraint-satisfying trajectories and the rest constraint-violating trajectories. The trajectories were sampled by policies in the pool of policy checkpoints. Since there is a trade-off between reward and cost in general, the dataset is a mixture of low-reward-low-cost trajectories and high-reward-high-cost trajectories.

Their method:

Follow their method. But instead of using C-DMPO, use our DDQN implementation. For varying the degree of cautiousness of the agent, use different penalty factor.

Dump the final data as a tab-separated file.

kihyukh commented 2 years ago

Thanks for the code and sample dataset. I have one last suggestion:

Before:

import pandas as pd
one_list = []
two_list = []
for one, two in data_source:
    one_list.append(one)
    two_list.append(two)
df = pd.dataframe(one=one_list, two=two_list) # bad: unnecessarily loads pandas and memory inefficient.
df.to_csv(file)  # bad: formatting is ugly and not standard

After:

print('\t'.join(['one', 'two']), file=file) # optional header row.
for one, two in data_source:
    print('\t'.join([json.dumps(one), json.dumps(two)]), file=file) # you may have to look up ways to json serialize numpy arrays
# good: memory efficient. very simple.
lgooo commented 2 years ago

Done.