tensorflow / tensorflow

An Open Source Machine Learning Framework for Everyone
https://tensorflow.org
Apache License 2.0
185.76k stars 74.21k forks source link

Problem in my code due to `tf.shape` and `Tensor.shape`. `tf.shape` and `Tensor.shape`, both are not working #62726

Closed VachanVY closed 8 months ago

VachanVY commented 8 months ago

Issue type

Bug

Have you reproduced the bug with TensorFlow Nightly?

No

Source

source

TensorFlow version

2.13.0

Custom code

Yes

OS platform and distribution

No response

Mobile device

No response

Python version

3.10.12

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

No response

Current behavior?

I've coded the DETR object detection pipeline from scratch in Tensorflow.
I've tested all the individual components in the pipeline and it works. But when I start training it on my dataset (in tf.data.Dataset form) I get an error
This mostly due to the behaviour of Tensor.shape and tf.shape. Tensor.shape returns None in it's shape and tf.shape returns something like Tensor("Shape_2:0", shape=(1,), dtype=int32) which is not the shape of the tensor

Please help. Thank you.

Standalone code to reproduce the issue

Kaggle Notebook to reproduce error Make a copy of the notebook to reproduce this issue.

Relevant log output

ValueError                                Traceback (most recent call last)
Cell In[33], line 3
      1 for epoch in range(1, DETR_ARGS.epochs + 1):
      2     print(f"Epoch {epoch}/{DETR_ARGS.epochs}")
----> 3     loss = train_step(train_ds)
      4     print(f"Loss at Epoch {epoch} : {loss}\n")
      5     model.save_weights(f'detr_weights_epoch{epoch}.keras')

File /opt/conda/lib/python3.10/site-packages/tensorflow/python/util/traceback_utils.py:153, in filter_traceback.<locals>.error_handler(*args, **kwargs)
    151 except Exception as e:
    152   filtered_tb = _process_traceback_frames(e.__traceback__)
--> 153   raise e.with_traceback(filtered_tb) from None
    154 finally:
    155   del filtered_tb

File /tmp/__autograph_generated_fileg7gvd3up.py:36, in outer_factory.<locals>.inner_factory.<locals>.tf__train_step(train_ds)
     34 grads = ag__.Undefined('grads')
     35 loss = ag__.Undefined('loss')
---> 36 ag__.for_stmt(ag__.converted_call(ag__.ld(enumerate), (ag__.ld(train_ds),), None, fscope), None, loop_body, get_state, set_state, (), {'iterate_names': '(step, (x_train, y_train))'})
     37 try:
     38     do_return = True

File /tmp/__autograph_generated_fileg7gvd3up.py:22, in outer_factory.<locals>.inner_factory.<locals>.tf__train_step.<locals>.loop_body(itr)
     20 with ag__.ld(tf).GradientTape() as tape:
     21     y_pred = ag__.converted_call(ag__.ld(model), (ag__.ld(x_train),), dict(training=True), fscope)
---> 22     y_pred = ag__.converted_call(ag__.ld(matcher), (ag__.ld(y_train), ag__.ld(y_pred)), None, fscope)
     23     loss = ag__.converted_call(ag__.ld(loss_fn), (ag__.ld(y_train), ag__.ld(y_pred)), None, fscope)
     24 grads = ag__.converted_call(ag__.ld(tape).gradient, (ag__.ld(loss), ag__.ld(model).trainable_weights), None, fscope)

File /tmp/__autograph_generated_fileweg7gf52.py:12, in outer_factory.<locals>.inner_factory.<locals>.tf____call__(self, y, y_hat)
     10 (class_true, bbox_true) = ag__.ld(y)
     11 (class_prob, bbox_pred) = ag__.ld(y_hat)
---> 12 (class_prob, bbox_pred) = ag__.converted_call(ag__.ld(Matcher).match, (ag__.ld(class_true), ag__.ld(bbox_true), ag__.ld(class_prob), ag__.ld(bbox_pred)), None, fscope)
     13 try:
     14     do_return = True

File /tmp/__autograph_generated_fileuj8_9ikk.py:11, in outer_factory.<locals>.inner_factory.<locals>.tf__match(class_true, bbox_true, class_prob, bbox_pred)
      9 retval_ = ag__.UndefinedReturnValue()
     10 (bbox_true, bbox_pred) = (ag__.converted_call(ag__.ld(swap_xy), (ag__.converted_call(ag__.ld(xywh_to_xyxy), (ag__.ld(bbox_true),), None, fscope),), None, fscope), ag__.converted_call(ag__.ld(swap_xy), (ag__.converted_call(ag__.ld(xywh_to_xyxy), (ag__.ld(bbox_pred),), None, fscope),), None, fscope))
---> 11 C = ag__.converted_call(ag__.ld(Matcher).batched_cost_matrix, (ag__.ld(class_true), ag__.ld(bbox_true), ag__.ld(class_prob), ag__.ld(bbox_pred)), None, fscope)
     12 idx = ag__.converted_call(ag__.ld(tf).stack, ([ag__.converted_call(ag__.ld(linear_sum_assignment), (ag__.ld(C)[ag__.ld(i)],), None, fscope)[1] for i in ag__.converted_call(ag__.ld(range), (ag__.ld(C).shape[0],), None, fscope)],), None, fscope)
     13 class_prob = ag__.converted_call(ag__.ld(tf).gather, (ag__.ld(class_prob), ag__.ld(idx)), dict(batch_dims=1), fscope)

File /tmp/__autograph_generated_filet5rvjc4z.py:20, in outer_factory.<locals>.inner_factory.<locals>.tf__batched_cost_matrix(class_true, bbox_true, class_prob, bbox_pred)
     18 try:
     19     do_return = True
---> 20     retval_ = ag__.converted_call(ag__.ld(tf).map_fn, (ag__.autograph_artifact(lambda B: ag__.converted_call(ag__.ld(Matcher).compute_cost_matrix, (ag__.ld(class_true)[ag__.ld(B)], ag__.ld(class_prob)[ag__.ld(B)], ag__.ld(bbox_true)[ag__.ld(B)], ag__.ld(bbox_pred)[ag__.ld(B)]), None, fscope)), ag__.converted_call(ag__.ld(tf).range, (ag__.converted_call(ag__.ld(tf).shape, (ag__.ld(class_true),), None, fscope)[0],), None, fscope)), dict(fn_output_signature=ag__.ld(tf).float32), fscope)
     21 except:
     22     do_return = False

File /tmp/__autograph_generated_filet5rvjc4z.py:20, in outer_factory.<locals>.inner_factory.<locals>.tf__batched_cost_matrix.<locals>.<lambda>(B)
     18 try:
     19     do_return = True
---> 20     retval_ = ag__.converted_call(ag__.ld(tf).map_fn, (ag__.autograph_artifact(lambda B: ag__.converted_call(ag__.ld(Matcher).compute_cost_matrix, (ag__.ld(class_true)[ag__.ld(B)], ag__.ld(class_prob)[ag__.ld(B)], ag__.ld(bbox_true)[ag__.ld(B)], ag__.ld(bbox_pred)[ag__.ld(B)]), None, fscope)), ag__.converted_call(ag__.ld(tf).range, (ag__.converted_call(ag__.ld(tf).shape, (ag__.ld(class_true),), None, fscope)[0],), None, fscope)), dict(fn_output_signature=ag__.ld(tf).float32), fscope)
     21 except:
     22     do_return = False

File /tmp/__autograph_generated_filejl_v0o33.py:11, in outer_factory.<locals>.inner_factory.<locals>.tf__compute_cost_matrix(class_true, class_prob, bbox_true, bbox_pred)
      9 do_return = False
     10 retval_ = ag__.UndefinedReturnValue()
---> 11 N = ag__.converted_call(ag__.ld(tf).shape, (ag__.ld(class_true),), None, fscope)[0]
     12 cost_i = ag__.autograph_artifact(lambda i: ag__.converted_call(ag__.ld(tf).map_fn, (ag__.autograph_artifact(lambda j: ag__.converted_call(ag__.ld(Matcher).L_match, (ag__.ld(class_true)[ag__.ld(i)], ag__.ld(class_prob)[ag__.ld(j), ag__.converted_call(ag__.ld(int), (ag__.ld(class_true)[ag__.ld(i)],), None, fscope)], ag__.ld(bbox_true)[ag__.ld(i)], ag__.ld(bbox_pred)[ag__.ld(j)]), None, fscope)), ag__.converted_call(ag__.ld(tf).range, (ag__.ld(N),), None, fscope)), dict(fn_output_signature=ag__.ld(tf).float32), fscope))
     13 try:

ValueError: in user code:

    File "/tmp/ipykernel_42/4115406382.py", line 7, in train_step  *
        y_pred = matcher(y_train, y_pred)
    File "/tmp/ipykernel_42/968499204.py", line 64, in __call__  *
        class_prob, bbox_pred = Matcher.match(class_true, bbox_true, class_prob, bbox_pred)
    File "/tmp/ipykernel_42/968499204.py", line 53, in match  *
        C = Matcher.batched_cost_matrix(class_true, bbox_true, class_prob, bbox_pred)
    File "/tmp/ipykernel_42/968499204.py", line 46, in batched_cost_matrix  *
        tf.range(tf.shape(class_true)[0]), fn_output_signature=tf.float32
    File "/tmp/ipykernel_42/968499204.py", line 22, in compute_cost_matrix  *
        N = tf.shape(class_true)[0]

    ValueError: slice index 0 of dimension 0 out of bounds. for '{{node map/while/strided_slice_4}} = StridedSlice[Index=DT_INT32, T=DT_INT32, begin_mask=0, ellipsis_mask=0, end_mask=0, new_axis_mask=0, shrink_axis_mask=1](map/while/Shape, map/while/strided_slice_4/stack, map/while/strided_slice_4/stack_1, map/while/strided_slice_4/stack_2)' with input shapes: [0], [1], [1], [1] and with computed input tensors: input[1] = <0>, input[2] = <1>, input[3] = <1>.
sushreebarsa commented 8 months ago

@VachanVY Could you use tf.shape to access dynamic shapes within computations and reshape tensors as needed, ensuring compatibility with layers and operations. Please consider using functions like tf.reshape or tf.expand_dims as well and let us know? Thank you!

VachanVY commented 8 months ago

Could you use tf.shape to access dynamic shapes within computations

I've used both tf.shape and Tensor.shape, but getting error as mentioned in the issue

reshape tensors as needed, ensuring compatibility with layers and operations.

But I don't want to reshape the tensors, should I just do it to ensure compatibility with layers and operations as you said? That means it's a bug right? And I'll do as you said and share the details.

VachanVY commented 8 months ago

@sushreebarsa If you don't mind could you please check the notebook DETR once, it's difficult to follow up like this. Thank You.

SuryanarayanaY commented 8 months ago

Hi @VachanVY ,

We won't debug user code particularly when its a long notebook due to nour bandwidth issues. I request you to submit a minimal code snippet that can reproduce the error so that it can be fixed or debugged. For support issues you can post the same at tensorflow-forum or stackoverflow.

VachanVY commented 8 months ago

Individual components work, but only while training there's a problem, so the entire code is required to reproduce the issue.

SuryanarayanaY commented 8 months ago

Hi @VachanVY ,

Can you please import the code to google colab, execute it and then submit a colab gist here ?

VachanVY commented 8 months ago

@SuryanarayanaY But I've provided a Kaggle notebook link.

google-ml-butler[bot] commented 8 months ago

Are you satisfied with the resolution of your issue? Yes No