tensorflow / addons

Useful extra functionality for TensorFlow 2.x maintained by SIG-addons
Apache License 2.0
1.69k stars 611 forks source link

module 'tensorflow_addons' has no attribute 'optimizers' (tfa-nightly) #2578

Closed asapsmc closed 1 year ago

asapsmc commented 3 years ago

System information

Describe the bug

After installing from nightly version, I got an error module 'tensorflow_addons' has no attribute 'optimizers'

Code to reproduce the issue

import tensorflow_addons as tfa ... radam = tfa.optimizers.RectifiedAdam(lr=cf["lr"], clipnorm=clipnorm)

Other info / logs

Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

bhack commented 3 years ago

/cc @szutenberg

szutenberg commented 3 years ago

Hi @MR-T77

I tried to reproduce it:

Everything seems to be ok.

Please check if you can run the following code:

import tensorflow_addons as tfa
print(tfa)
print(tfa.optimizers)
print(tfa.optimizers.RectifiedAdam)

What does it return?

If you're still getting an error then please attach the output from pip freeze.

asapsmc commented 3 years ago

Hi @szutenberg! It seems something was broken with my conda environment (I tried so many things...). I uninstalled tensorflow-addons (0.14) and reinstalled tfa-nightly, and now I can import tfa without error. Just to be sure: whenever I update something on a conda environment, I immediately run code on top of it, I don't restart terminal or vscode (I'm not sure this is the best process or if I should restart something). Nevertheless, although I can import addons, I cannot use them, I always get the following error (tried with other optimizers too but I get the same error):

Traceback (most recent call last):
  File "/Users/machine/miniforge3/envs/tf/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/machine/miniforge3/envs/tf/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/Users/machine/.vscode/extensions/ms-python.python-2021.9.1246542782/pythonFiles/lib/python/debugpy/__main__.py", line 45, in <module>
    cli.main()
  File "/Users/machine/.vscode/extensions/ms-python.python-2021.9.1246542782/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py", line 444, in main
    run()
  File "/Users/machine/.vscode/extensions/ms-python.python-2021.9.1246542782/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py", line 285, in run_file
    runpy.run_path(target_as_str, run_name=compat.force_str("__main__"))
  File "/Users/machine/miniforge3/envs/tf/lib/python3.9/runpy.py", line 268, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/Users/machine/miniforge3/envs/tf/lib/python3.9/runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/Users/machine/miniforge3/envs/tf/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/Users/machine/Projects/finetune-asp/src/ISMIR2020_v2.py", line 471, in <module>
    main()
  File "/Users/machine/Projects/finetune-asp/src/ISMIR2020_v2.py", line 466, in main
    new_train(dataset, 'TCNv2', cpu=False, addons=True)
  File "/Users/machine/Projects/finetune-asp/src/ISMIR2020_v2.py", line 329, in new_train
    history = model.fit(train, steps_per_epoch=len(train), epochs=cf["num_epochs"], shuffle=True,
  File "/Users/machine/miniforge3/envs/tf/lib/python3.9/site-packages/tensorflow/python/keras/engine/training.py", line 1183, in fit
    tmp_logs = self.train_function(iterator)
  File "/Users/machine/miniforge3/envs/tf/lib/python3.9/site-packages/tensorflow/python/eager/def_function.py", line 889, in __call__
    result = self._call(*args, **kwds)
  File "/Users/machine/miniforge3/envs/tf/lib/python3.9/site-packages/tensorflow/python/eager/def_function.py", line 950, in _call
    return self._stateless_fn(*args, **kwds)
  File "/Users/machine/miniforge3/envs/tf/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 3023, in __call__
    return graph_function._call_flat(
  File "/Users/machine/miniforge3/envs/tf/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 1960, in _call_flat
    return self._build_call_outputs(self._inference_function.call(
  File "/Users/machine/miniforge3/envs/tf/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 591, in call
    outputs = execute.execute(
  File "/Users/machine/miniforge3/envs/tf/lib/python3.9/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation model/conv_1_3x3_conv/Conv2D/ReadVariableOp: Could not satisfy explicit device specification '' because the node {{colocation_node model/conv_1_3x3_conv/Conv2D/ReadVariableOp}} was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'. All available devices [/job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:GPU:0]. 
Colocation Debug Info:
Colocation group had the following types and supported devices: 
Root Member(assigned_device_name_index_=2 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
Equal: CPU 
AssignSubVariableOp: GPU CPU 
AssignVariableOp: GPU CPU 
GreaterEqual: GPU CPU 
FloorDiv: CPU 
Sqrt: GPU CPU 
NoOp: GPU CPU 
Pow: GPU CPU 
Mul: CPU 
Cast: GPU CPU 
Identity: GPU CPU 
SelectV2: GPU CPU 
ReadVariableOp: GPU CPU 
RealDiv: GPU CPU 
Sub: GPU CPU 
AddV2: GPU CPU 
Const: GPU CPU 
Square: GPU CPU 
_Arg: GPU CPU 

Colocation members, user-requested devices, and framework assigned devices, if any:
  model_conv_1_3x3_conv_conv2d_readvariableop_resource (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
  lookahead_lookahead_update_mul_5_readvariableop_resource (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
  lookahead_lookahead_update_mul_8_readvariableop_resource (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
  lookahead_lookahead_update_sub_10_readvariableop_resource (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
  model/conv_1_3x3_conv/Conv2D/ReadVariableOp (ReadVariableOp) 
  Lookahead/Lookahead/update/Identity (Identity) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/add/y (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/add (AddV2) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/Cast_5 (Cast) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/Pow (Pow) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/Pow_1 (Pow) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub/x (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub (Sub) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_1/x (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_1 (Sub) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/truediv/x (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/truediv (RealDiv) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_2/x (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_2 (Sub) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_3/x (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_3 (Sub) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/truediv_1/x (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/truediv_1 (RealDiv) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/truediv_2/x (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/truediv_2 (RealDiv) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_4/y (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_4 (Sub) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul/x (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul_1 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul_2 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_5 (Sub) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_6/y (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_6 (Sub) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_7/y (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_7 (Sub) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/truediv_3 (RealDiv) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_8/y (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_8 (Sub) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul_3 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_9/y (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_9 (Sub) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/truediv_4 (RealDiv) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul_4 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/truediv_5 (RealDiv) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/Sqrt (Sqrt) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/GreaterEqual (GreaterEqual) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/Const (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul_5/ReadVariableOp (ReadVariableOp) 
  Lookahead/Lookahead/update/mul_5 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul_6 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/add_1 (AddV2) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/AssignVariableOp (AssignVariableOp) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/ReadVariableOp_1 (ReadVariableOp) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul_7 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul_8/ReadVariableOp (ReadVariableOp) 
  Lookahead/Lookahead/update/mul_8 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/Square (Square) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul_9 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/add_2 (AddV2) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/AssignVariableOp_1 (AssignVariableOp) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/ReadVariableOp_2 (ReadVariableOp) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul_10 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/Sqrt_1 (Sqrt) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul_11 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/add_3 (AddV2) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/truediv_6 (RealDiv) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/SelectV2 (SelectV2) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul_12 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/AssignSubVariableOp (AssignSubVariableOp) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/group_deps (NoOp) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/add_4/y (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/add_4 (AddV2) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/Cast_7/x (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/Cast_7 (Cast) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/Cast_8/x (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/ReadVariableOp_4 (ReadVariableOp) 
  Lookahead/Lookahead/update/sub_10/ReadVariableOp (ReadVariableOp) 
  Lookahead/Lookahead/update/sub_10 (Sub) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul_13 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/ReadVariableOp_5 (ReadVariableOp) 
  Lookahead/Lookahead/update/add_5 (AddV2) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/floordiv (FloorDiv) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul_14 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/Equal (Equal) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/SelectV2_1/ReadVariableOp (ReadVariableOp) 
  Lookahead/Lookahead/update/SelectV2_1 (SelectV2) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/AssignVariableOp_2 (AssignVariableOp) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/SelectV2_2/ReadVariableOp (ReadVariableOp) 
  Lookahead/Lookahead/update/SelectV2_2 (SelectV2) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/AssignVariableOp_3 (AssignVariableOp) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/group_deps_1 (NoOp) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/group_deps_2 (NoOp) /job:localhost/replica:0/task:0/device:GPU:0

         [[{{node model/conv_1_3x3_conv/Conv2D/ReadVariableOp}}]] [Op:__inference_train_function_30042]  
bhack commented 3 years ago

/cc @lgeiger Can you replicate this on your M1?

szutenberg commented 3 years ago

@MR-T77 It looks that there are no issues with TFA but there are missing GPU kernels which breaks collocation.

You need to check what types do you use with Equal, Mul and FloorDiv. You can dump graphs (TF_DUMP_GRAPH_PREFIX + turn on vlog) and check it in pbtxt file.

asapsmc commented 3 years ago

@szutenberg I'm a newbie on this type of things, sorry. Could you please point me to more detailed description of what I need to do? Thanks in advance

asapsmc commented 3 years ago

Also, I'm not using any custom code here. It's a simple BLSTM (3-layer keras Bidirectional(simpleRNN) with a dense output)

szutenberg commented 3 years ago

@MR-T77 maybe the easiest would be to provide the code so that we can reproduce it locally.

İf you want to debug it on your own then set TF_CPP_MAX_VLOG_LEVEL to 10, and TF_DUMP_GRAPH_PREFİX=tmp. You should see tmp dir after running the script. Reading placer_input.pbtxt will answer to my question (simply grep -A 20 -rn FloorDiv). You'll see dtypes.

Anyway I'm traveling now so I'm not able to help you further until ~12th October.

asapsmc commented 3 years ago

@szutenberg thank you so much for your availability to help! I'll try to understand what's wrong until 12th October. If I'm unsuccessful, I'll contact again.

asapsmc commented 3 years ago

@szutenberg anyways, I leave here the code I'm using, just in case it's a easy thing you can spot on:

import os
  import pickle
  import numpy as np
  from tensorflow.keras.utils import Sequence
  from tensorflow.keras.layers import (Dense, Input)
  from tensorflow.keras.layers import SimpleRNN, Bidirectional, Masking, LSTM  # For BLSTM
  from tensorflow.keras.models import Sequential, Model
  import madmom

  import tensorflow.keras.backend as K
  import tensorflow as tf

  import tensorflow_addons as tfa

  from modules.utils import PKL_PATH

  tf.config.set_soft_device_placement(True)
  # GENERAL CONSTANTS
  FPS = 100  # set the frame rate as FPS frames per second
  MASK_VALUE = -1

  lr = 0.05
  num_epochs = 50

  class Fold(object):

      def __init__(self, folds, fold):
          self.folds = folds
          self.fold = fold

      @property
      def test(self):
          # fold N for testing
          return np.unique(self.folds[self.fold])

      @property
      def val(self):
          # fold N+1 for validation
          return np.unique(self.folds[(self.fold + 1) % len(self.folds)])

      @property
      def train(self):
          # all remaining folds for training
          train = np.hstack(self.folds)
          train = np.setdiff1d(train, self.val)
          train = np.setdiff1d(train, self.test)
          return train

  class DataSequence_BLSTM(Sequence):

      mask_value = -999  # only needed for batch sizes > 1

      def __init__(self, x, y, batch_size=1, max_seq_length=None, fps=FPS):
          self.x = x
          self.y = [madmom.utils.quantize_events(o, fps=fps, length=len(d))
                    for o, d in zip(y, self.x)]
          self.batch_size = batch_size
          self.max_seq_length = max_seq_length

      def __len__(self):
          return int(np.ceil(len(self.x) / float(self.batch_size)))

      def __getitem__(self, idx):
          # determine which sequence(s) to use
          x = self.x[idx * self.batch_size:(idx + 1) * self.batch_size]
          y = self.y[idx * self.batch_size:(idx + 1) * self.batch_size]
          # pad them if needed
          if self.batch_size > 1:
              x = tf.keras.preprocessing.sequence.pad_sequences(
                  x, maxlen=self.max_seq_length, dtype=np.float32, truncating='post', value=self.mask_value)
              y = tf.keras.preprocessing.sequence.pad_sequences(
                  y, maxlen=self.max_seq_length, dtype=np.int32, truncating='post', value=self.mask_value)
          return np.array(x), np.array(y)[..., np.newaxis]

  def simple_BLSTM(dataset, cpu=False):
      train_db = pickle.load(open('%s/%s.pkl' % (PKL_PATH, dataset), 'rb'))
      num_fold = 0
      fold = Fold(train_db.folds, num_fold)
      train = DataSequence_BLSTM([train_db.x[i] for i in fold.train],
                                 [train_db.annotations[i] for i in fold.train],
                                 batch_size=1, max_seq_length=60 * FPS)
      val = DataSequence_BLSTM([train_db.x[i] for i in fold.val],
                               [train_db.annotations[i] for i in fold.val],
                               batch_size=1, max_seq_length=60 * FPS)
      input_layer = Input((None, train[0][0].shape[-1]))
      masked = Masking(mask_value=-999)(input_layer)
      blstm_1 = Bidirectional(SimpleRNN(units=25, return_sequences=True))(masked)
      blstm_2 = Bidirectional(SimpleRNN(units=25, return_sequences=True))(blstm_1)
      blstm_3 = Bidirectional(SimpleRNN(units=25, return_sequences=True))(blstm_2)
      output_layer = Dense(1, name='output', activation='sigmoid')(blstm_3)
      model = Model(input_layer, output_layer)
      radam = tfa.optimizers.RectifiedAdam(lr=lr, clipnorm=0.5)
      ranger = tfa.optimizers.Lookahead(radam, sync_period=6, slow_step_size=0.5)
      model.compile(optimizer=ranger, loss=K.binary_crossentropy, metrics=['binary_accuracy'])
      history = model.fit(train, steps_per_epoch=len(train), epochs=num_epochs, shuffle=True,
                          validation_data=val, validation_steps=len(val),
                          verbose=True)
      return True

  def main():
      tf.config.set_soft_device_placement(True)
      dataset = "traintest_smallsmc"
      simple_BLSTM(dataset)

  if __name__ == "__main__":

      main()
bhack commented 3 years ago

Have you already tried with https://www.tensorflow.org/api_docs/python/tf/config/set_soft_device_placement ?

bhack commented 3 years ago

@szutenberg Could it be a side effect of your introduced var.device?

asapsmc commented 3 years ago

Have you already tried with https://www.tensorflow.org/api_docs/python/tf/config/set_soft_device_placement ?

Yes, it's there in the code: "tf.config.set_soft_device_placement(True)"

bhack commented 3 years ago

@MR-T77 Yes sorry I missed, is that your code is not well formatted.

szutenberg commented 3 years ago

Hi @MR-T77

I'm back. Have you solved the issue?

Today I tried to reproduce your problem and unfortunately, the code requires pickle which is not attached. I created dummy training data and everything works fine - FloorDiv (T=INT64) is placed on GPU.

Graphs don't contain name "conv_1_3x3_conv" so probably the code I got did not produce the error message you attached.

Please provide full code and all required files together with frozen pip list (pip freeze).

asapsmc commented 3 years ago

Hi @szutenberg! Unfortunately no, I have not solved the issue, although I have tried everything I could. But right now, and with this short snippet (from 30-Set) the behaviour is different than the initial: Now, it seems that the train starts, but just stalls after appearing "Epoch 1/50". I send you the missing pickle here attached, as well as the frozen pip list and a export of "conda list --explicit". traintest_smallsmc.pkl.zip condaenv.txt pipfreeze.txt

asapsmc commented 3 years ago

And to be complete, in my original code (the one with conv layers) I am getting the same error as initially exposed ("Cannot assign a divide for operation..."). Nevertheless, if I replace the optimizer by the simple keras.optimizers.Adam, I can train the model. Here I attach the code problem_TCN.py.zip Thanks in advance.

szutenberg commented 2 years ago

Hi @MR-T77 ,

I'm sorry but it seems that one more file is missing: definitions.py Traceback (most recent call last): File "problem_TCN.py", line 183, in <module> simple_TCN(dataset) File "problem_TCN.py", line 135, in simple_TCN train_db = pickle.load(open('%s.pkl' % (dataset), 'rb')) ModuleNotFoundError: No module named 'definitions'

Could you make sure that it reproduces on google colab?

asapsmc commented 2 years ago

Hi @szutenberg, I'm so sorry for that, the pickle file was saved in a module definitions.py, that's why it is requesting that file, although it does not need it. I re-saved the pickle in "main", and I attach it as well as a more complete _problemTCN.py. traintest_smallsmc.pkl.zip

szutenberg commented 2 years ago

Hi @MR-T77 ,

Unfortunately now I have another error: AttributeError: Can't get attribute 'Dataset' on <module '__main__' from 'problem_TCN.py'>

Could you please prepare google colab which demonstrates your problem? Note that you don't need to provide real data - dummy training data is enough: you just need to make sure that shapes match.

Thanks!

bhack commented 2 years ago

Yes a Colab with dummy input data is the best thing to share so we could verify if it is something only related to your MacOs M1 platform or a more general issue.

asapsmc commented 2 years ago

Hi @szutenberg and @bhack : sorry for my late reply but I was trying to generate dummy data, but I couldn't do it without further errors (I'm a newbie). I really hope with this Google Colab you can test everything (otherwise, please instruct me). You'll just have to upload the pkl file into your runtime. In Colab I don't have errors, I can run this exact code. But I'm using a whole set of different libraries (e.g. no tf-metal) and versions (tf, tf-addons).

szutenberg commented 2 years ago

Hi @MR-T77

I took the code from colab and was able to run it in my virtual env (Ubuntu 20.04): tensorflow-gpu 2.5.0 tfa-nightly 0.15.0.dev20210922190150

All ops Equal are placed on GPU and everything works fine.

asapsmc commented 2 years ago

Hi @szutenberg,

So, do you think it is some clash between libraries in my environment or other reason?

bhack commented 2 years ago

@lgeiger Can you try to see if you can reproduce this on your M1 ?

asapsmc commented 2 years ago

I just want to clarify that I can run this exact code with no problems if I use tf.keras.Adam. If I start using tf-addons optimizers (e.g. Radam) I get the above errors. Following @szutenberg request, I already sent the pip freeze list and paste it here again. Do you think any of these libraries/versions could be causing this clash with tf-addons? pipfreeze.txt

bhack commented 2 years ago

Can you check your device placement:

https://www.tensorflow.org/api_docs/python/tf/debugging/set_log_device_placement

asapsmc commented 2 years ago

Hi @bhack , I did as you said and got this:

warnings.warn( args_0: (_Arg): /job:localhost/replica:0/task:0/device:CPU:0 GeneratorDataset: (GeneratorDataset): /job:localhost/replica:0/task:0/device:CPU:0 NoOp: (NoOp): /job:localhost/replica:0/task:0/device:CPU:0 Identity: (Identity): /job:localhost/replica:0/task:0/device:CPU:0 FakeSink0: (Identity): /job:localhost/replica:0/task:0/device:CPU:0 identity_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:CPU:0 args_0: (_Arg): /job:localhost/replica:0/task:0/device:CPU:0 GeneratorDataset: (GeneratorDataset): /job:localhost/replica:0/task:0/device:CPU:0 NoOp: (NoOp): /job:localhost/replica:0/task:0/device:CPU:0 Identity: (Identity): /job:localhost/replica:0/task:0/device:CPU:0 FakeSink0: (Identity): /job:localhost/replica:0/task:0/device:CPU:0 identity_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:CPU:0 Epoch 1/50 assignvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 AssignVariableOp: (AssignVariableOp): /job:localhost/replica:0/task:0/device:GPU:0 iter/Initializer/zeros: (Const): /job:localhost/replica:0/task:0/device:GPU:0 iterator: (_Arg): /job:localhost/replica:0/task:0/device:CPU:0 iterator_1: (_Arg): /job:localhost/replica:0/task:0/device:CPU:0 model_conv_1_convolution_conv2d_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_conv_1_convolution_biasadd_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_conv_2_convolution_conv2d_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_conv_2_convolution_biasadd_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_conv_3_convolution_conv2d_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_conv_3_convolution_biasadd_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_1_dilated_conv_conv1d_expanddims_1_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_1_dilated_conv_biasadd_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_1_conv_1x1_conv1d_expanddims_1_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_1_conv_1x1_biasadd_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_2_dilated_conv_conv1d_expanddims_1_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_2_dilated_conv_biasadd_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_2_conv_1x1_conv1d_expanddims_1_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_2_conv_1x1_biasadd_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_4_dilated_conv_conv1d_expanddims_1_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_4_dilated_conv_biasadd_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_4_conv_1x1_conv1d_expanddims_1_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_4_conv_1x1_biasadd_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_8_dilated_conv_conv1d_expanddims_1_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_8_dilated_conv_biasadd_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_8_conv_1x1_conv1d_expanddims_1_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_8_conv_1x1_biasadd_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_16_dilated_conv_conv1d_expanddims_1_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_16_dilated_conv_biasadd_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_16_conv_1x1_conv1d_expanddims_1_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_16_conv_1x1_biasadd_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_32_dilated_conv_conv1d_expanddims_1_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_32_dilated_conv_biasadd_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_32_conv_1x1_conv1d_expanddims_1_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_32_conv_1x1_biasadd_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_64_dilated_conv_conv1d_expanddims_1_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_64_dilated_conv_biasadd_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_64_conv_1x1_conv1d_expanddims_1_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_64_conv_1x1_biasadd_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_128_dilated_conv_conv1d_expanddims_1_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_128_dilated_conv_biasadd_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_128_conv_1x1_conv1d_expanddims_1_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_128_conv_1x1_biasadd_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_256_dilated_conv_conv1d_expanddims_1_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_256_dilated_conv_biasadd_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_256_conv_1x1_conv1d_expanddims_1_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_256_conv_1x1_biasadd_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_512_dilated_conv_conv1d_expanddims_1_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_512_dilated_conv_biasadd_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_512_conv_1x1_conv1d_expanddims_1_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_512_conv_1x1_biasadd_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_1024_dilated_conv_conv1d_expanddims_1_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_1024_dilated_conv_biasadd_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_1024_conv_1x1_conv1d_expanddims_1_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_1024_conv_1x1_biasadd_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_output_tensordot_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_output_biasadd_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 assignaddvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 assignaddvariableop_1_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_cast_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_cast_2_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_cast_3_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_cast_4_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_cast_6_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_1_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_1_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_1_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_2_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_2_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_2_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_3_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_3_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_3_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_4_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_4_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_4_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_5_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_5_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_5_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_6_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_6_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_6_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_7_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_7_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_7_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_8_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_8_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_8_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_9_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_9_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_9_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_10_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_10_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_10_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_11_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_11_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_11_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_12_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_12_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_12_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_13_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_13_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_13_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_14_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_14_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_14_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_15_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_15_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_15_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_16_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_16_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_16_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_17_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_17_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_17_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_18_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_18_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_18_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_19_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_19_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_19_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_20_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_20_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_20_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_21_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_21_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_21_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_22_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_22_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_22_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_23_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_23_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_23_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_24_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_24_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_24_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_25_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_25_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_25_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_26_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_26_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_26_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_27_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_27_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_27_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_28_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_28_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_28_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_29_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_29_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_29_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_30_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_30_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_30_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_31_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_31_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_31_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_32_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_32_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_32_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_33_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_33_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_33_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_34_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_34_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_34_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_35_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_35_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_35_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_36_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_36_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_36_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_37_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_37_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_37_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_38_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_38_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_38_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_39_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_39_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_39_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_40_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_40_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_40_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_41_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_41_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_41_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_42_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_42_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_42_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_43_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_43_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_43_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_44_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_44_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_44_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_45_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_45_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_45_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_46_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_46_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_46_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_47_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_47_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_47_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_48_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_48_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_48_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_49_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_49_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_49_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_50_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_50_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_50_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_51_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_51_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_51_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 assignaddvariableop_2_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 assignaddvariableop_3_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 assignaddvariableop_4_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 IteratorGetNext: (IteratorGetNext): /job:localhost/replica:0/task:0/device:CPU:0

After this I get the error message:

Exception has occurred: InvalidArgumentError       (note: full exception trace is shown but execution is paused at: <module>)
Cannot assign a device for operation model/conv_1_convolution/Conv2D/ReadVariableOp: Could not satisfy explicit device specification '' because the node {{colocation_node model/conv_1_convolution/Conv2D/ReadVariableOp}} was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'. All available devices [/job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:GPU:0]. 
Colocation Debug Info:
Colocation group had the following types and supported devices: 
Root Member(assigned_device_name_index_=2 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
Equal: CPU 
AssignSubVariableOp: GPU CPU 
AssignVariableOp: GPU CPU 
GreaterEqual: GPU CPU 
FloorDiv: CPU 
Sqrt: GPU CPU 
NoOp: GPU CPU 
Pow: GPU CPU 
Mul: CPU 
Cast: GPU CPU 
Identity: GPU CPU 
SelectV2: GPU CPU 
ReadVariableOp: GPU CPU 
RealDiv: GPU CPU 
Sub: GPU CPU 
AddV2: GPU CPU 
Const: GPU CPU 
Square: GPU CPU 
_Arg: GPU CPU 

Colocation members, user-requested devices, and framework assigned devices, if any:
  model_conv_1_convolution_conv2d_readvariableop_resource (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
  lookahead_lookahead_update_mul_5_readvariableop_resource (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
  lookahead_lookahead_update_mul_8_readvariableop_resource (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
  lookahead_lookahead_update_sub_10_readvariableop_resource (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
  model/conv_1_convolution/Conv2D/ReadVariableOp (ReadVariableOp) 
  Lookahead/Lookahead/update/Identity (Identity) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/add/y (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/add (AddV2) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/Cast_5 (Cast) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/Pow (Pow) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/Pow_1 (Pow) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub/x (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub (Sub) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_1/x (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_1 (Sub) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/truediv/x (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/truediv (RealDiv) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_2/x (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_2 (Sub) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_3/x (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_3 (Sub) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/truediv_1/x (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/truediv_1 (RealDiv) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/truediv_2/x (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/truediv_2 (RealDiv) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_4/y (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_4 (Sub) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul/x (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul_1 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul_2 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_5 (Sub) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_6/y (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_6 (Sub) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_7/y (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_7 (Sub) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/truediv_3 (RealDiv) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_8/y (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_8 (Sub) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul_3 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_9/y (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_9 (Sub) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/truediv_4 (RealDiv) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul_4 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/truediv_5 (RealDiv) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/Sqrt (Sqrt) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/GreaterEqual (GreaterEqual) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/Const (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul_5/ReadVariableOp (ReadVariableOp) 
  Lookahead/Lookahead/update/mul_5 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul_6 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/add_1 (AddV2) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/AssignVariableOp (AssignVariableOp) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/ReadVariableOp_1 (ReadVariableOp) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul_7 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul_8/ReadVariableOp (ReadVariableOp) 
  Lookahead/Lookahead/update/mul_8 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/Square (Square) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul_9 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/add_2 (AddV2) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/AssignVariableOp_1 (AssignVariableOp) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/ReadVariableOp_2 (ReadVariableOp) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul_10 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/Sqrt_1 (Sqrt) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul_11 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/add_3 (AddV2) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/truediv_6 (RealDiv) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/SelectV2 (SelectV2) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul_12 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/AssignSubVariableOp (AssignSubVariableOp)/job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/group_deps (NoOp) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/add_4/y (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/add_4 (AddV2) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/Cast_7/x (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/Cast_7 (Cast) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/Cast_8/x (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/ReadVariableOp_4 (ReadVariableOp) 
  Lookahead/Lookahead/update/sub_10/ReadVariableOp (ReadVariableOp) 
  Lookahead/Lookahead/update/sub_10 (Sub) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul_13 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/ReadVariableOp_5 (ReadVariableOp) 
  Lookahead/Lookahead/update/add_5 (AddV2) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/floordiv (FloorDiv) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul_14 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/Equal (Equal) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/SelectV2_1/ReadVariableOp (ReadVariableOp) 
  Lookahead/Lookahead/update/SelectV2_1 (SelectV2) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/AssignVariableOp_2 (AssignVariableOp) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/SelectV2_2/ReadVariableOp (ReadVariableOp) 
  Lookahead/Lookahead/update/SelectV2_2 (SelectV2) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/AssignVariableOp_3 (AssignVariableOp) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/group_deps_1 (NoOp) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/group_deps_2 (NoOp) /job:localhost/replica:0/task:0/device:GPU:0

     [[{{node model/conv_1_convolution/Conv2D/ReadVariableOp}}]] [Op:__inference_train_function_15035]
bhack commented 2 years ago

Have you tried to see if it is failing with:

https://www.tensorflow.org/api_docs/python/tf/config/set_soft_device_placement

asapsmc commented 2 years ago

Have you tried to see if it is failing with:

https://www.tensorflow.org/api_docs/python/tf/config/set_soft_device_placement

Yes, these errors happen also with tf.config.set_soft_device_placement(True). (and you already had that same question 21 days ago, if I understood correctly.)

bhack commented 2 years ago

Yes I suppoed you have changed your environment

bhack commented 2 years ago

@MR-T77 As we don't have an M1 machine available can you try install from source and check if it works at this commit fbf79940d294dd84e8ffc452abb331a01bee5aff

asapsmc commented 2 years ago

@bhack I need you to be more detailed please: how can I pick that specific commit?

bhack commented 2 years ago

git checkout -b <yourbranchname> <the_hash_you_want>

asapsmc commented 2 years ago

I'm getting too many errors and can't install from source: 1- When doing bazel build build_pip_pkg I was getting:

ERROR: The project you're trying to build requires Bazel 3.7.2 (specified in /Users/machine/Downloads/addons/.bazelversion), but it wasn't found in /opt/homebrew/Cellar/bazel/4.2.1_1/libexec/bin.

1.1-Also couldn't download Bazel 3.7.2 as it doesn't have darwin-arm64 build 1.2- To try to solve this I deleted .bazelversion and then I was able to complete bazel build build_pip_pkg, although with several warnings. 2- Proceeding with bazel 4.2.1-homebrew I got the following errors when doing bazel-bin/build_pip_pkg artifacts:

AssertionError: would build wheel with unsupported tag ('cp39', 'cp39', 'macosx_11_0_arm64')
+ true
+ cp 'dist/*.whl' /Users/asapinto/Downloads/addons/artifacts
cp: dist/*.whl: No such file or directory
asapsmc commented 2 years ago

Any update on this? Thanks in advance.

bhack commented 2 years ago

I don't have an M1 to reproduce this. M1 packaging was added by @lgeiger. I don't know if he could check this.

lgeiger commented 2 years ago

@MR-T77 Did you check if https://github.com/tensorflow/addons/issues/2579#issuecomment-947826101 fixes it for you?

asapsmc commented 2 years ago

@lgeiger : we're talking about 2 different things. On #2579, I couldn't build the wheels. On this current thread, I pip installed tfa-nightly, but I just can't run the code.

bhack commented 2 years ago

To work with source with your example you need to build and install the wheel also here on that specific commit. An alternative is to open a new PR adding a test with your example (but minimized).

asapsmc commented 2 years ago

So, I was able to install from source on that specific commit and I got the exact same errors as in here, ie: can't use any of the tensorflow-addons optimizers. I'd appreciate your feedback on how to solve this.

lgeiger commented 2 years ago

Do you have tensorflow-metal installed? If so, could you try uninstalling it and if not could you try installing it? Just to make sure that this has nothing to do with some ops not being supported by metal.

asapsmc commented 2 years ago

@lgeiger: you are right. When I uninstalled tensorflow-metal, I stopped getting the error. But of course, everything starts running only in CPU.
Do you think this is my only option, ie, running everything on CPU?

lgeiger commented 2 years ago

Do you think this is my only option, ie, running everything on CPU?

For now, unfortunately yes. It seems like some operation is not yet supported by the metal device, but I am not sure if the TFA optimizer could be rewritten to either not use this op or to not require it to be placed on the same device as the other devices in the group.

asapsmc commented 2 years ago

Thank you for your answer. Nevertheless, given the inability of Apple to provide support for developers, I hope you find some ingenious solution on your side.

bhack commented 2 years ago

The official Apple support is at https://developer.apple.com/forums/tags/tensorflow-metal

asapsmc commented 2 years ago

@bhack I know, I've been trying but they just don't give support.

bhack commented 2 years ago

What is your thread there?

bhack commented 2 years ago

If is this one https://developer.apple.com/forums/thread/692818 I suppose that just 4 days old with Saturday and Sunday it isn't too much long as waiting time.

asapsmc commented 2 years ago

That's not mine. But you can check this one (very similar to my problem), which was posted 3 months ago. I've got other 2 threads (my user is the same, so you can search by that) posted almost 1 month ago, also not solved by Apple. Besides this, I also submitted the issue through Feedback Assistant, but I have not any feedback. So, 1 month or even 3 months seem more than sufficient time for a company like Apple to solve these issues, or at least to provide some feedback. Wouldn't you agree?