Open ezyang opened 1 year ago
cc @nxdong
After nms https://github.com/pytorch/vision/pull/7944 there is:
Slice with unbacked SymInt. Workaround could be
diff --git a/torchvision/models/detection/retinanet.py b/torchvision/models/detection/retinanet.py
index 3a9cf80d1d..07aa5179cd 100644
--- a/torchvision/models/detection/retinanet.py
+++ b/torchvision/models/detection/retinanet.py
@@ -554,7 +554,7 @@ class RetinaNet(nn.Module):
# non-maximum suppression
keep = box_ops.batched_nms(image_boxes, image_scores, image_labels, self.nms_thresh)
- keep = keep[: self.detections_per_img]
+ #keep = keep[: self.detections_per_img]
detections.append(
{
The problem is we do not know statically if detections_per_img
is in bounds or needs to be clamped. User can probably tell us it's guaranteed to be in bounds, I'm guessing.
Unspec NN module. Looks like
Traceback (most recent call last):
File "/data/users/ezyang/b/pytorch/wn.py", line 13, in <module>
exported_model = torch._dynamo.export(model, x)
File "/data/users/ezyang/b/pytorch/torch/_dynamo/eval_frame.py", line 1259, in export
return inner(*extra_args, **extra_kwargs)
File "/data/users/ezyang/b/pytorch/torch/_dynamo/eval_frame.py", line 1233, in inner
graph = rewrite_signature(
File "/data/users/ezyang/b/pytorch/torch/_dynamo/eval_frame.py", line 865, in rewrite_signature
matched_input_elements_positions = produce_matching(
File "/data/users/ezyang/b/pytorch/torch/_dynamo/eval_frame.py", line 858, in produce_matching
raise AssertionError(
AssertionError: graph-captured input #2 (<class 'torch.Tensor'>) is not among original args (<class 'torch.Tensor'>, <class 'torch.Tensor'>)
This is because of cell_anchors
write:
[2023-09-07 19:02:58,035] [0/0] torch._dynamo.symbolic_convert: [DEBUG] TRACE STORE_ATTR cell_anchors [ListVariable(), NNModuleVariable()]
[2023-09-07 19:02:58,035] [0/0] torch._dynamo.symbolic_convert: [DEBUG] FAILED INLINING <code object set_cell_anchors at 0x7f3ccf8182f0, file "/data/users/ezyang/b/torchvision/torchvision/models/detection/anchor_utils.py", line 76>
[2023-09-07 19:02:58,037] [0/0] torch._dynamo.output_graph: [DEBUG] restore_graphstate: removed 10 nodes
[2023-09-07 19:02:58,037] [0/0] torch._dynamo.symbolic_convert: [DEBUG] FAILED INLINING <code object forward at 0x7f3ccf818870, file "/data/users/ezyang/b/torchvision/torchvision/models/detection/anchor_utils.py", line 115>
[2023-09-07 19:02:58,039] [0/0] torch._dynamo.output_graph: [DEBUG] restore_graphstate: removed 60 nodes
[2023-09-07 19:02:58,039] [0/0] torch._dynamo.symbolic_convert: [DEBUG] FAILED INLINING <code object _call_impl at 0x7f3d105a5210, file "/data/users/ezyang/b/pytorch/torch/nn/modules/module.py", line 1520>
[2023-09-07 19:02:58,039] [0/0] torch._dynamo.output_graph: [DEBUG] restore_graphstate: removed 0 nodes
[2023-09-07 19:02:58,041] [0/0] torch._dynamo.convert_frame: [INFO] Restarting analysis due to _dynamo/variables/nn_module.py:138 in convert_to_unspecialized
When we restart with the module as unspecialized, this causes the anchor module's parameters to become inputs
[2023-09-07 19:03:13,095] [0/0] torch._dynamo.output_graph: [DEBUG] create_graph_input L_self_anchor_generator_cell_anchors_0_ L['self'].anchor_generator.cell_anchors[0]
This still does not work, still choking on AssertionError: Mutating module attribute cell_anchors during export.
This still does not work, still choking on
AssertionError: Mutating module attribute cell_anchors during export.
I got the same error while converting the model using the ai_edge_torch library.
🐛 Describe the bug
This model export was requested by user at https://github.com/pytorch/pytorch/issues/108337 . It is fairly similar to maskrcnn which is one of the priority models that internal export. While investigating the user report I hacked up a bunch of stuff in torchvision which isn't easily landable, I want to durably record it here.
Repro script:
Tested on 621463a3e6b488b2bff04e355a1abd9a4c5bb2cd
Last time I did this for maskrcnn: https://docs.google.com/document/d/159NTQQhz8ovIBxbQvGQ-fZ10pF9e2RPXm1JZYqdEzt4/edit#heading=h.jw6vkqei769s
f-string problem. Same as in maskrcnn. Dynamo still chokes on f-string messages (known bug for quite a long time, https://github.com/pytorch/pytorch/issues/103602 )
Scale factor tire fire. Same as in maskrcnn. I actually half fixed this but one last bit I forgot to do last time https://github.com/pytorch/vision/pull/7942
Data-dependent orig_kval.
orig_kval
is unbacked so we cannot do a stock min on it which will attempt to test iforig_kval < input.size(axis)
which we won't know. Dynamo ought to be able to translate this to sym_min automatically.NN module setattr Same as in maskrcnn
Batched nms threshold trick. We can either torch.cond this, or taking a page from torchvision._is_tracing test here, just always do the coordinate trick.
nms support. I could have sworn that I meta-fied this but apparently not. NMS is data dependent so it needs @zou3519 impl_abstract for out of tree registration.
Versions
main
cc @msaroufim @wconstab @bdhirsh @anijain2305 @zou3519