tensorflow / lucid

A collection of infrastructure and tools for research in neural network interpretability.
Apache License 2.0
4.65k stars 655 forks source link

activation-atlas-collect.ipynb does not work outside of collab #277

Open benjybarnett opened 3 years ago

benjybarnett commented 3 years ago

Hey-

Obviously you can't run this in collab (https://colab.research.google.com/github/tensorflow/lucid/blob/master/notebooks/activation-atlas/activation-atlas-collect.ipynb) because you need to download your own imagenet dataset and set up your own data provider. I have done this but then the code does not run when I run it on my GPU.

I get the traceback:

`Traceback (most recent call last):
  File "visualise.py", line 153, in <module>
    vec, label_index, record_key, label_text, image = sess.run([T(layer_name), t_label, t_record_key, t_label_text, image_tensor_])
  File "/home/benjy/.conda/envs/visualise/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 956, in run
    run_metadata_ptr)
  File "/home/benjy/.conda/envs/visualise/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/benjy/.conda/envs/visualise/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
    run_metadata)
  File "/home/benjy/.conda/envs/visualise/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.OutOfRangeError: 2 root error(s) found.
  (0) Out of range: RandomShuffleQueue '_2_parallel_read/common_queue' is closed and has insufficient elements (requested 1, current size 0)
         [[node parallel_read/common_queue_Dequeue (defined at /home/benjy/.conda/envs/visualise/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
         [[case/Assert/AssertGuard/Assert/data_0/_23]]
  (1) Out of range: RandomShuffleQueue '_2_parallel_read/common_queue' is closed and has insufficient elements (requested 1, current size 0)
         [[node parallel_read/common_queue_Dequeue (defined at /home/benjy/.conda/envs/visualise/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'parallel_read/common_queue_Dequeue':
  File "visualise.py", line 63, in <module>
    provider = DatasetDataProvider(data_split, seed=7)
  File "/home/benjy/UCL_Attention_Project_codebase/DatasetDataProvider.py", line 91, in __init__
    scope=scope)
  File "/home/benjy/.conda/envs/visualise/lib/python3.7/site-packages/tf_slim/data/parallel_reader.py", line 270, in parallel_read
    reader_kwargs=reader_kwargs).read(filename_queue)
  File "/home/benjy/.conda/envs/visualise/lib/python3.7/site-packages/tf_slim/data/parallel_reader.py", line 135, in read
    return self._common_queue.dequeue(name=name)
  File "/home/benjy/.conda/envs/visualise/lib/python3.7/site-packages/tensorflow_core/python/ops/data_flow_ops.py", line 446, in dequeue
    self._queue_ref, self._dtypes, name=name)
  File "/home/benjy/.conda/envs/visualise/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_data_flow_ops.py", line 4140, in queue_dequeue_v2
    timeout_ms=timeout_ms, name=name)
  File "/home/benjy/.conda/envs/visualise/lib/python3.7/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
    op_def=op_def)
  File "/home/benjy/.conda/envs/visualise/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/home/benjy/.conda/envs/visualise/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
    attrs, op_def, compute_device)
  File "/home/benjy/.conda/envs/visualise/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
    op_def=op_def)
  File "/home/benjy/.conda/envs/visualise/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()`

It appears the issue is based on the threads in line:

threads = tf.train.start_queue_runners(sess=sess, coord=coord)

as if I comment this out, the code runs further but then stalls on the line:

vec, label_index, record_key, label_text, image = sess.run([T(layer_name), t_label, t_record_key, t_label_text, image_tensor_])

I can't seem to get this working at all. Does this notebook work outside of Collab? I've had to edit some of this small syntax too, which makes me wonder if this has been tested? Im using TF1.15, and Lucid 0.3.9.