lucidrains / q-transformer

Implementation of Q-Transformer, Scalable Offline Reinforcement Learning via Autoregressive Q-Functions, out of Google Deepmind
MIT License
324 stars 19 forks source link

Question about num_timestep #12

Open carolineys opened 3 months ago

carolineys commented 3 months ago

In the code, there is this "num_timesteps" in the constructor of the ReplayMemoryDataset class. Does this "numtimesteps" correspond to the concept of "window" in the paper? In my understanding, the q function proposed in the paper only takes states within the window range (s{t-w} to s_t) and feed them into the transformer instead of using all of the previous states. Is this interpretation correct? I am confused because in the code, the default "num_timesteps" is 1, which, in my understanding won't encode any sequential info.

lucidrains commented 3 months ago

@YifeiChen777 hi Caroline, yes you need to set that to greater than 1 for the autoregressive Q-learning

yuriy10139 commented 1 month ago

in autoregressive_q_learn() method of QLearner the time steps dimension is moved into batch dimension. E.g. if we have a batch of 16 samples with 3 timesteps each - this will be converted into a batch of 48.

    # fold time steps into batch
    states, time_ps = pack_one(states, "* c f h w")
    actions, _ = pack_one(actions, "* n")

It looks like all the next learning does not take into account history and treats each element of a batch independently because attention mechanism seems to not span across the batch dimension. Thus, if every item in a batch attends only to itself (and cross-attends to encoded_state but still within a single item of a batch), the model does not see and will not recognize any inter-timestep dependencies. Please correct me if miss something?

Johnly1986 commented 3 weeks ago

When I change num_timestep = 50, a problem of tensor size mismatch occurs RuntimeError: The size of tensor a (200) must match the size of tensor b (10000) at non-singleton dimension 0

lucidrains commented 3 weeks ago

@Johnly1986 could you try again on the latest version?

Johnly1986 commented 3 weeks ago

image I used the latest version, and the tensor size mismatch caused by num_timestep no longer exists, but it automatically exits after running for a while.

I found out that the sudden exit was because of memory explosion, it used 30GB of memory.. Now I want to know where the memory is being consumed and hope it can run on the GPU

lucidrains commented 3 weeks ago

in autoregressive_q_learn() method of QLearner the time steps dimension is moved into batch dimension. E.g. if we have a batch of 16 samples with 3 timesteps each - this will be converted into a batch of 48.

    # fold time steps into batch
    states, time_ps = pack_one(states, "* c f h w")
    actions, _ = pack_one(actions, "* n")

It looks like all the next learning does not take into account history and treats each element of a batch independently because attention mechanism seems to not span across the batch dimension. Thus, if every item in a batch attends only to itself (and cross-attends to encoded_state but still within a single item of a batch), the model does not see and will not recognize any inter-timestep dependencies. Please correct me if miss something?

hi, yes this is correct afaict.

this is why in the todo section in the readme i have written improvise cross attention to past actions and states of timestep, transformer-xl fashion (w/ structured memory dropout)

are you able to get things working with just single frames? i'm happy to invest some time building out transformer-xl component for you if you have everything setup, and willing to share your experimental results

lucidrains commented 3 weeks ago

@yuriy10139 also, you should reach out to @2M-kotb, as i think he was playing around with the repo for his research some time back

yuriy10139 commented 3 weeks ago

in autoregressive_q_learn() method of QLearner the time steps dimension is moved into batch dimension. E.g. if we have a batch of 16 samples with 3 timesteps each - this will be converted into a batch of 48.

    # fold time steps into batch
    states, time_ps = pack_one(states, "* c f h w")
    actions, _ = pack_one(actions, "* n")

It looks like all the next learning does not take into account history and treats each element of a batch independently because attention mechanism seems to not span across the batch dimension. Thus, if every item in a batch attends only to itself (and cross-attends to encoded_state but still within a single item of a batch), the model does not see and will not recognize any inter-timestep dependencies. Please correct me if miss something?

hi, yes this is correct afaict.

this is why in the todo section in the readme i have written improvise cross attention to past actions and states of timestep, transformer-xl fashion (w/ structured memory dropout)

are you able to get things working with just single frames? i'm happy to invest some time building out transformer-xl component for you if you have everything setup, and willing to share your experimental results

I've prepared a small demo in a separate repository https://github.com/yuriy10139/q-transfromer-maniskill-demo Currently a bit short of compute, so I have tried it just once on my laptop without any hyperparameter search and on 112x112 image from a single camera (Maniskill env default is 128x128, so I guess that is not that bad).

On 50 eval runs the model shows 4.054, 4.647, 4.568 of average reward for the 4000-step, 7000-step and 10000-step checkpoints respectively, so probably it learns something, but still quite far from well-trained.

If you'll manage to add history-based learning, I hope to find more GPU time to test it with bigger resolution and for more timesteps.

lucidrains commented 3 weeks ago

@yuriy10139 thank you! i'll add a few things soon