rwth-i6 / pytorch-to-returnn

Make PyTorch code runnable within RETURNN
3 stars 6 forks source link

Checks for non-deterministic modules #90

Closed vieting closed 2 years ago

vieting commented 2 years ago

As described in #89, we currently do not account for non-deterministic modules when checking against the reference.

I would like to add torch.randint, therefore this PR should also introduce a logic for non-deterministic modules

vieting commented 2 years ago

The logic I added in _make_tf_feed_dict is similar to Module.check_call_returnn_outputs_to_prev_torch. However, torch_naming=None in this case, so the subsequent steps are failing.

vieting commented 2 years ago

I updated the PR and for the _run_torch_returnn_drop_in this works now. The feed dict creation needs to be generalized, but do you think the rest is legit?

Then we need a concept for the RETURNN standalone runs. Should we also feed the tensor there? Then we need to do this via absolute layer names I guess. Or we skip the check in case there is a non-deterministic module to make sure that the RETURNN code also runs standalone and not just with the tensor in the feed_dict.

albertz commented 2 years ago

I updated the PR and for the _run_torch_returnn_drop_in this works now. The feed dict creation needs to be generalized, but do you think the rest is legit?

Yes, looks good! Only some smaller comments (see inline comments).

Then we need a concept for the RETURNN standalone runs. Should we also feed the tensor there? Then we need to do this via absolute layer names I guess. Or we skip the check in case there is a non-deterministic module to make sure that the RETURNN code also runs standalone and not just with the tensor in the feed_dict.

I made one suggestion how to keep the reference per absolute layer name. Then putting those into the corresponding feed dict of the RETURNN standalone run should be simple. It might already work.

If this turns out to be complicated or difficult, as you say, we might drop the equal check.

vieting commented 2 years ago

Apparently, different tests access the same Naming instance. Did you know this? Do you know how to avoid it? Currently, the tests after test_randint fail because "randint" is still in naming.non_deterministic_layer_outputs but obviously not in the network anymore.

albertz commented 2 years ago

Apparently, different tests access the same Naming instance. Did you know this? Do you know how to avoid it? Currently, the tests after test_randint fail because "randint" is still in naming.non_deterministic_layer_outputs but obviously not in the network anymore.

Are you sure? Note that for non_deterministic_layer_outputs you had exactly this bug (see my comment).

vieting commented 2 years ago

Apparently, different tests access the same Naming instance. Did you know this? Do you know how to avoid it? Currently, the tests after test_randint fail because "randint" is still in naming.non_deterministic_layer_outputs but obviously not in the network anymore.

Are you sure? Note that for non_deterministic_layer_outputs you had exactly this bug (see my comment).

Yes right, fixed this now.

vieting commented 2 years ago

To me this seems ready now. If you want I can clean up the history to separate between the handling of non-deterministic modules and adding torch.randint, otherwise we just squash everything.

albertz commented 2 years ago

To me this seems ready now. If you want I can clean up the history to separate between the handling of non-deterministic modules and adding torch.randint, otherwise we just squash everything.

Looks mostly good, except some smaller things, see my remaining comments.

I think we can just squash everything then.