takuseno / d3rlpy

An offline deep reinforcement learning library
https://takuseno.github.io/d3rlpy
MIT License
1.29k stars 230 forks source link

[Question] Question about log parameters #267

Open Natmat626 opened 1 year ago

Natmat626 commented 1 year ago

I really enjoyed working with this repo.Thanks for making d3rlpy, which helped me a lot to get started in offline RL. I just got into offline RL. There are many parameter indicators recorded in the log, such as "actor_loss", “alpha_loss”..., which are different from many parameters in online RL. I would like to ask if you plan to add descriptions of these parameters in the document? Or how can I get a detailed description of these parameters that a novice like me can understand.

takuseno commented 1 year ago

@Natmat626 Thanks for the issue. Actually, there is no difference between online and offline in terms of types of logs. Currently, the descriptions of logs, which are written for beginners, are not documented anywhere. Sorry for the inconvenience.

Natmat626 commented 1 year ago

@takuseno Thanks for the answer, I'm sad that there is no relevant documentation to take care of newbies. But now I have a simple question to ask you, I hope you can answer it. In "Online_RL", the Loss curve is generally not regarded as a judging standard. People will observe "Episode_mean_reward" and "Episode_mean_step" more, because this can more correctly evaluate the performance of the current model. But if I now collect a set of expert data from a complex game environment, it means that it is a set of data with only positive rewards, and because of the complexity of the game envionment , then the "scorers" evaluation tool in the "fit" function is invalid for such cases. So in the case of training with pure expert data, how should the Loss value curve be understood? I understand that it should be different from the Loss value in "Online_RL". It may be more like the Loss value in deep learning. The smaller the value, the better the data fit. I think my idea is probably wrong, because I am just a game developer, and my understanding of machine education is too limited. I hope you can help me answer this question. thanks very much!