technologiestiftung / maps-latent-space

An AI exploration on how to create maps and a infrastructure to display it in an exhibition space. A collaboration between Birds On Mars and Technologiestiftung Berlin/CityLAB.
MIT License
0 stars 0 forks source link

Running without GPU #29

Open ff6347 opened 4 years ago

ff6347 commented 4 years ago

I tried to run the container I created without the GPU support but I get the following error. Any ideas

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1356, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation Gs/_Run/Gs/latents_in: {{node Gs/_Run/Gs/latents_in}}was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0 ]. Make sure the device specification refers to a valid device. The requested device appears to be a GPU, but CUDA is not enabled.
     [[Gs/_Run/Gs/latents_in]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "latent-navigation-server.py", line 150, in <module>
    out_json = latent_navigation(data)
  File "latent-navigation-server.py", line 102, in latent_navigation
    img = Gs.run(latent, None, truncation_psi=0.7, randomize_noise=True, output_transform=fmt)
  File "/workdir/dnnlib/tflib/network.py", line 443, in run
    mb_out = tf.get_default_session().run(out_expr, dict(zip(in_expr, mb_in)))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 950, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1173, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1350, in _do_run
    run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1370, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation Gs/_Run/Gs/latents_in: node Gs/_Run/Gs/latents_in (defined at /workdir/dnnlib/tflib/network.py:218) was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0 ]. Make sure the device specification refers to a valid device. The requested device appears to be a GPU, but CUDA is not enabled.
     [[Gs/_Run/Gs/latents_in]]

Errors may have originated from an input operation.
Input Source operations connected to node Gs/_Run/Gs/latents_in:
 Gs/_Run/split (defined at /workdir/dnnlib/tflib/network.py:404)
birdNicolas commented 4 years ago

I think the problem is that in the used tflib it is hard coded to use the GPU. I will try to find a workaround and get rid of the hard coded GPU assignment.

Seba-birds commented 4 years ago

According to this post, without gpu the whole script needs to be re-written to use tensorflow as opposed to tensorflow-gpu. I'll look into that.

Seba-birds commented 4 years ago

I changed the script according to this post to run on cpu by adding the option allow_soft_placement=True where the session gets initialized (which is in /ml/dnnlib/tflib/tfutil.py:132-142). This is the resulting error:

tensorflow.python.framework.errors_impl.UnimplementedError: Depthwise convolution on CPU is only supported for NHWC format \ [[{{node Gs/_Run/Gs/G_synthesis/8x8/Conv0_up/Blur2D/depthwise}} = DepthwiseConv2dNative[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], _device="/job:localhost/replica:0/task:0/device:CPU:0"](Gs/_Run/Gs/G_synthesis/8x8/Conv0_up/Conv2D, Gs/_Run/Gs/G_synthesis/8x8/Conv0_up/Blur2D/filter)]]

I assume that the operation on node Gs/_Run/Gs/G_synthesis/8x8/Conv0_up/Blur2D/depthwise is defined by the model. I don't know if you can change the data_format after training -- I'll look into that.

ff6347 commented 4 years ago

To bad. Thought it would be just flipping a switch. How will we keep both versions side by side?

Seba-birds commented 4 years ago

I committed my changes to a separate branch called "cpu_exec". In the README of that branch and in the commit messages I documented all changes that I made. I didn't change anything in "master".

You can switch to the other version with

git pull
git branch
git checkout cpu_exec
ff6347 commented 4 years ago

Okay. Got it. Are the changes so complex, that we can't have them both in one branch? Would it mean a lot of duplicate code?

Seba-birds commented 4 years ago

Hey! Sorry, I was on sick leave for 2 weeks. We will have another look at the problem this week and once we'll have tackled it, we will wrap it up so that it might be merged back into master.

ff6347 commented 4 years ago

@Seba-birds no problemo

Seba-birds commented 4 years ago

Hey! We did some research and it turns out that the changes to the model would be so complex that we cannot really handle that at the time being. Also, a CPU-based inference would be very slow (at least 20x slower than GPU), so all work invested here would result in a much worse experience. Since the branch "cpu_exec" does not contribute to anything else, I won't merge it to master.

ff6347 commented 4 years ago

Oh. That’s sad. Thanks for the effort and the research. @bnjmnsbl so we wait until the exhibition reopens?