Improve discrete control offline RL benchmark

nuance1979 commented 2 years ago

[x] I have marked all applicable categories:
- [ ] exception-raising bug
- [ ] RL algorithm bug
- [ ] documentation request (i.e. "X is missing from the documentation.")
- [x] new feature request
[x] I have visited the source website
[x] I have searched through the issue tracker for duplicates

[ ] I have mentioned version numbers, operating system and environment, where applicable:

import tianshou, gym, torch, numpy, sys
print(tianshou.__version__, gym.__version__, torch.__version__, numpy.__version__, sys.version, sys.platform)

I wonder if anyone is actively working on improving the benchmark on discrete control offline RL policies. As I noted in examples/offline/README.md, we should use a publicly available dataset to benchmark our policies.

Currently the best discrete control offline datasets seem to be the Atari portion of RL Unplugged. I tried to convert them into Tianshou Batch but couldn't find out how to get the done flag.

https://github.com/deepmind/deepmind-research/blob/1642ae3499c8d1135ec6fe620a68911091dd25ef/rl_unplugged/atari.py#L26-L37

If I assume the data points are in order, then I might be able to find the point where the next episode id is not the same as the current episode id and mark it to be episode end. But I don't know if this assumption holds.

@Trinkle23897 What do you think? Have you worked with Reverb data before?

Trinkle23897 commented 2 years ago

but couldn't find out how to get the done flag.

They always treat discount as another term of done. Ref: https://github.com/sail-sg/envpool/blob/5b08389ec0fad903a9fb3288d54f470bc790bdfc/envpool/python/dm_envpool.py#L63

https://github.com/deepmind/deepmind-research/blob/1642ae3499c8d1135ec6fe620a68911091dd25ef/rl_unplugged/atari.py#L227

nuance1979 commented 2 years ago

I managed to convert a shard of Pong dataset (Pong/run_1-00000-of-00100) into tianshou.data.ReplayBuffer and saved it to disk in hdf5. However, the size of the hdf5 file is 53GB! As a reference, the size of the original file is 720MB. As I understand it, the original file is a gzipped TFRecord protobuffer. The number of samples is 498549. For RL Unplugged Atari data, observations are already framestacked at 4, i.e., the obs space shape is (84, 84, 4). Note that I'm talking about one file here. There are 5*100=500 similarly-sized files for each Atari game.

I wanted to do this conversion because I hate to keep Tensorflow installed. However, without compression, the file size becomes an issue. Also the conversion is quite slow. It would be great if we could find some cloud storage space for the converted files.

What do you think? @Trinkle23897

Trinkle23897 commented 2 years ago

Maybe we can add another way of ReplayBuffer save/restore. I remember the compression algorithm from numpy itself is much more efficient than pickle/hdf5 (according to my experiments at that time).

nuance1979 commented 2 years ago

Maybe we can add another way of ReplayBuffer save/restore. I remember the compression algorithm from numpy itself is much more efficient than pickle/hdf5 (according to my experiments at that time).

I tried hdf5 compression first and it worked pretty well: 53GB -> 283MB.

nuance1979 commented 2 years ago

I have a script to convert a shard of RL Unplugged dataset into a tianshou.data.ReplayBuffer. Each shard contains about 500k transitions. Now I want to run an experiment with 1M transitions. What is the best way to "merge" two ReplayBuffers? @Trinkle23897

I tried to use a ReplayBufferManager([buf1, buf2]) but encountered a strange error: After https://github.com/thu-ml/tianshou/blob/7f23748347d6bf4aebce3931f7e57291012cd98d/tianshou/data/buffer/manager.py#L26, the script ended with an error message saying that numpy.ndarray object doesn't have attribute options. I printed out the types before and after that line and indeed, an array of ReplayBuffer was transformed into an array of numpy.ndarray. Also the time for the transformation was way too long.

Trinkle23897 commented 2 years ago

Not sure what happens, could you please send me the code?

nuance1979 commented 2 years ago

Not sure what happens, could you please send me the code?

Sure. See attachment.

I added a break here to generate two small buffers with 1000 transitions (otherwise it's too slow): https://github.com/thu-ml/tianshou/blob/41afc2584a4881a0f47925f7373e2fab4ea7bf6f/examples/offline/convert_rl_unplugged_atari.py#L206

Command line:

python3 ./atari_bcq.py --task BreakoutNoFrameskip-v4 --load-buffer-name ~/.rl_unplugged/buffers/Breakout/run_1-00001-of-00100.hdf5 --buffer-from-rl-unplugged --more-buffer-names ~/.rl_unplugged/buffers/Breakout/run_1-00002-of-00100.hdf5 --epoch 2 --device 'cuda:1' &> log.bcq.breakout.epoch_2.rl_unplugged.shard_1+2&

The error message:

Observations shape: (4, 84, 84)
Actions shape: 4
Traceback (most recent call last):
  File "./atari_bcq.py", line 211, in <module>
    test_discrete_bcq(get_args())
  File "./atari_bcq.py", line 143, in test_discrete_bcq
    buffer = ReplayBufferManager(bufs)
  File "/home/yi.su/git/tianshou/tianshou/data/buffer/manager.py", line 29, in __init__
    kwargs = self.buffers[0].options
AttributeError: 'numpy.ndarray' object has no attribute 'options'

atari_bcq.py.zip

Trinkle23897 commented 2 years ago

Is it possible to use an empty dataset to reproduce this result? (unrelated to rl-unplugged, because I need quite a long time to download one file...)

nuance1979 commented 2 years ago

Is it possible to use an empty dataset to reproduce this result? (unrelated to rl-unplugged, because I need quite a long time to download one file...)

I made minimum datasets with 5 transitions and 5 max size and reproduced the error. See attachment.

Breakout.zip

Trinkle23897 commented 2 years ago

I think the reason behind is that, when we developed the RBM, we assume the buffers in input buffer list are all uninitialized:

[ReplayBuffer(), ReplayBuffer(), ReplayBuffer(), ReplayBuffer(), ReplayBuffer(), ReplayBuffer(), ReplayBuffer(), ReplayBuffer(), ReplayBuffer(), ReplayBuffer()]

But if you pass an initialized buffer, it will be split by numpy.array automatically (and that's the root cause of slow speed):

[[Batch(...)
  Batch(...)
  Batch(...)
  Batch(...)
  Batch(...)]
 [Batch(...)
  Batch(...)
  Batch(...)
  Batch(...)
  Batch(...)]]

The normal (and maybe efficient) way:

vecbuf = VectorReplayBuffer()
# maybe we should manually trigger vecbuf._set_batch() first to allocate memory?
for i, name in enumerate(buffer_names):
  tmp_buf = ReplayBuffer.load(name)
  vecbuf.buffers[i].update(tmp_buf)

thu-ml / tianshou

Improve discrete control offline RL benchmark #612