jeffheaton / t81_558_deep_learning

T81-558: Keras - Applications of Deep Neural Networks @Washington University in St. Louis
https://sites.wustl.edu/jeffheaton/t81-558/
Other
5.71k stars 3.04k forks source link

Problem with class 12 - Atari #75

Closed nadimahmedwales closed 3 years ago

nadimahmedwales commented 4 years ago

Hi, Thank you for the course. I tried to run the Atari example on Google's Colab, however there seems to be an issue. I have restarted the session as mentioned in the lecture video, but I still get an error.

The problem arises with the Agent section of the Jupiter Notebook, the box which starts with defining the optimiser.

optimizer = tf.compat.v1.train.RMSPropOptimizer(

The issue happens with the last part of the box, I have copied and pasted the error below:

ValueError Traceback (most recent call last)

in () 33 debug_summaries=False, 34 summarize_grads_and_vars=False, ---> 35 train_step_counter=_global_step) 36 37 11 frames /usr/local/lib/python3.6/dist-packages/tf_agents/utils/nest_utils.py in assert_matching_dtypes_and_inner_shapes(tensors, specs, caller, tensors_name, specs_name, allow_extra_fields) 334 get_dtypes(specs), 335 get_shapes(tensors), --> 336 get_shapes(specs))) 337 338 ValueError: : Inconsistent dtypes or shapes between `inputs` and `input_tensor_spec`. dtypes: vs. . shapes: (1, 84, 84, 4) vs. (84, 84, 4). In call to configurable 'DqnAgent' () I hope this is enough information to resolve the issue. Thanks.
jeffheaton commented 4 years ago

Yes, I can reproduce the error just fine. Google CoLab upgraded versions of TensorFlow recently. I will take a look at what needs to be changed soon.

jeffheaton commented 4 years ago

Also, looks like Google changed/broke some things on the base CoLab image that now causes this error also:

EasyProcessError: start error <EasyProcess cmd_param=['Xvfb', '-help'] cmd=['Xvfb', '-help'] oserror=[Errno 2] No such file or directory: 'Xvfb': 'Xvfb' return_code=None stdout="None" stderr="None" timeout_happened=False>

I am looking into both.

jeffheaton commented 3 years ago

So, I know what is going on. Some changes to TF-Agents now cause it to throw an error because it detects that the neural network was created with floating point inputs, yet the Atari environment returns ints (0-255). I can get through the above error with this code

observation_spec = BoundedTensorSpec(
    shape = observation_spec.shape,   
    dtype = np.float32,
    name =  observation_spec.name,
    minimum = observation_spec.minimum,
    maximum = observation_spec.maximum)

But, the spec is cached in several locations, so this just causes a different cast error further down. I wish TF-Agents actually included an Atari example. Need to put a bit more thought into how to handle this breaking change.

jeffheaton commented 3 years ago

I did raise an issue in TF-Agents, we will see if they have any guidance. I am sure I can "hack" my way through this, and probably that is what is called for. But I am sure they will only break my "hack" on their next version. I am a bit surprised they do not have a "Hello World" Atari example anymore.

https://github.com/tensorflow/agents/issues/487

jeffheaton commented 3 years ago

See discussion at the above bug (agents 487), this is a bug in TF-Agents that hopefully they will resolve soon. I will add a note to my notebook.

jeffheaton commented 3 years ago

TF-Agents added an experimental Atari example:

https://github.com/tensorflow/agents/blob/master/tf_agents/experimental/examples/dqn/mnih15/dqn_train_eval_atari.py

I will see about incorporating this into my example soon.

gandalf625 commented 3 years ago

Link seems to be dead, any updates on this issue ?

Gabriel-Yashim commented 3 years ago

Hi, thanks for this lecture, I was watching your video on youtube and it was really helpful. I was following the video to but got stuck on this line of the code:

for filename in tqdm(os.listdir(faces_path)):

I got this error message:

FileNotFoundError Traceback (most recent call last)

in () 16 training_data = [] 17 faces_path = os.path.join(DATA_PATH,'faces') ---> 18 for filename in tqdm(os.listdir(faces_path)): 19 path = os.path.join(faces_path,filename) 20 image = Image.open(path).resize((GENERATE_SQUARE, FileNotFoundError: [Errno 2] No such file or directory: '/content/drive/My Drive/Colab Notebooks/project/face_images/faces' But i set the correct path, I will attach a screenshot of my google drive. ![drive](https://user-images.githubusercontent.com/60280085/123530484-b9a59b80-d6f2-11eb-8f39-5c7ee4cbe55e.PNG)
jeffheaton commented 3 years ago

TF-Agents seems to have removed all of their Atari examples, and their code has several issues that prevent it from working with the Gym Atari examples. I will very likely move to a better library for reinforcement learning. I am just not having much luck with TF-Agents.

jeffheaton commented 3 years ago

Okay, I believe I fixed it. I checked in a new version, works entirely in CoLab. It is NOT very efficient with training, I need to tune it a bit. Also need some general cleanup. I will leave this issue open while working on that.

jeffheaton commented 3 years ago

I have it working and tuned as best I can. The later versions of TF-Agents do not seem to train as efficiently as before on Atari, which is unfortunate, but I do not believe Atari classes in TF-Agents is really a priority (or even interest) of the TF-Agents team.