tensorflow / agents

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.
Apache License 2.0
2.77k stars 714 forks source link

What does the return function do in the _step function? #45

Closed Bhaney44 closed 5 years ago

Bhaney44 commented 5 years ago

I am having trouble understanding this piece of code:

def _step(self, action):

        "Apply the action and return the next time_step(reward, observation)."
        if is_final(self.state):
            return self.reset()
        observation, reward = self._apply_action(action)
        return TimeStep(observation, reward)

So, I know the function of _step is to apply the action and return the next time_step, which contains a reward and observation. (self, action) are parameters, to which arguments may be passed. Here, the agent is traveling to the next state in the environment with its previous action and self. And, if the current state is the final state in the environment, then call the reset() function from the environment. Here, the observation, reward pair is set equal to self._apply_action(action). However, I do not understand what self.apply_action(action) means or does. Additionally, what does the return function do?

oars commented 5 years ago

I'm not sure where you are seeing that code, it seems like it's picked from different parts of the code base?

Assuming this is an environment step (on some custom env or wrapper?) then this is generating a TimeStep named tuple:

https://github.com/tensorflow/agents/blob/master/tf_agents/environments/time_step.py

Bhaney44 commented 5 years ago

@oars Are there any other resources for learning? Your response didn't help. I am having a hard to time figuring out how to start with Nightly. Some issues I am having:

  1. If I run script in Idle, it returns no module agents - I then pip installed agents, but it didn't change anything.

  2. I want to run the script from the command line. However, I can't because I cannot figure out the syntax for the command. I Google'd it, but Google returns information about changing directories.

  3. I thought about using Jupyter notebook, but I am hesitant because I have had a lot of problems with Jupyter in the past.

  4. The only resources for Nightly I know of is your (a)YouTube video and the (b)GitHub. (a)The video was awesome and I wrote down the code. However, the code was missing a lot and you didn't explain it line by line. And, the code isn't hash-tagged out with explanations, so I can't figure out how the different parts are supposed to fit together. (b) the second resource is the GitHub. I cannot figure out how to use GitHub. I understand the concept of forking, but I cannot figure out how to fork a project. I google'd it, but it suggested there was a fork button and it isn't on the page. So, I can't fork anything. I also don't understand all the different files, which ones are important, which ones are necessary, or how they fit together.

I have successfully run deep reinforcement learning algorithms in Gym. And, I am a really hard worker and want to learn how to run deep reinforcement learning algorithms with Nightly. Please help me.

oars commented 5 years ago

I'm not sure what you mean with running in Idle, or what you are referring to with Nightly. Right now the release for TF-Agents is tf-agents-nightly which depends on tf-nightly and tfp-nightly

So different issues to try and address:

Bhaney44 commented 5 years ago

Idle is a python interpreter. To run a command in Idle, you select run from the top bar on the screen and then right click 'run module' from the drop down arrow. I'm not sure what you mean by:

  1. Right now the release for TF-Agents is tf-agents-nightly which depends on tf-nightly and tfp-nightly

My best guess is you mean TF-Agents is a library called tf-agents nightly (but do not understand your syntax). I also do not know what you mean be by 'depends on tf-nightly and tfp-nightly' Indeed, I don't know what 'depends on' means in this context - my best guess is tf-nightly and tfp nightly are software packages underneath the code for nightly, but I do not understand where they are on my machine.

  1. sandbox

I do not know what 'sandbox' means in this context. So I cannot associate any meaning to the sentence in which the word is present.

Your advice for learning is helpful though. Thanks. I never knew about the colab, but it looks sweet and I am going to work on it tonight.

kbanoop commented 5 years ago

To run in the sandbox, for each colab there is a link called 'Run in Google Colab' (near the top of the file under Getting Started) which will take you to the sandbox.

e.g. for this colab https://github.com/tensorflow/agents/blob/master/tf_agents/colabs/1_dqn_tutorial.ipynb:

if you click on the 'Run in Google Colab' link, it will take you to

https://colab.sandbox.google.com/github/tensorflow/agents/blob/master/tf_agents/colabs/1_dqn_tutorial.ipynb

Once you are there, you just have to execute each cell in order

Bhaney44 commented 5 years ago

That was the most helpful. Thank you so much @kbanoop.