Closed sayakpaul closed 1 year ago
Note that align_corners=None
does give the same result as tf.image.resize
, to an absolute tolerance of 1e-6 or so:
upsampled_logits_pt = torch.nn.functional.interpolate(
dummy_logits_pt, size=interp_shape, mode="bilinear", align_corners=None
)
I had to face a similar probelem.
With the torch.nn.functional.interpolate
:
align_corners=True
, the best option is to use the tf.compat.v1.image.resize
align_corners=False
, the tf.image.resize
does the trick.Here is a colab notebook that details the solution that I proposed.
@amyeroberts and @gante were against using the tf.compat.v1.image.resize
for obvious reasons. @amyeroberts did come up with a solution which is documented in this comment. I hope this provides some value to this thread.
Thanks @hollance and @ariG23498 for chiming in.
When there's one / two interpolation ops, the small differences are fine but when they are done several times (as mentioned earlier), these differences compound up which creates mismatches in the final outputs.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Pinging this to keep it open - we've implemented TF versions of Torch ops like AdaptivePool
, so this might be something to explore as well. I hope a performant solution can be implemented with native ops (with or without XLA compilation) without needing to write our own CUDA, though!
Pinging this to keep it open - we've implemented TF versions of Torch ops like AdaptivePool, so this might be something to explore as well.
Could you point me to something relevant? Would love to see the new implementations.
Hi @sayakpaul, I'm extremely sorry! I thought we'd implemented it in data2vec2
already, but checking the code it seems like we still have the old version there. Here's a much, much more performant version.
The new version will also allow XLA compilation, unlike the original sparse implementation
Oh yeah, I remember this one. I need to update the data2vec code with your latest implementation. Reminded me of that.
Thanks, Matt!
Although, checking out that notebook, it seems like Pytorch's interpolate
and TF's resize
are using the same algorithm, and the differences are mostly numerical; when I switch the dtype to float64
, the max absolute differences is 1e-7.
I think this will make it extremely hard to bring the accuracies any closer - we would have to rewrite the internals of the algorithm so that they're not just mathematically equivalent, but they lose precision in the same places as well. I think we're stuck with the issue, unfortunately!
Fair enough. The tiny differences just add up when there's multiple such interpolations. Otherwise, it's not a big deal.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Ping.
I don't think it is resolved yet.
I doubt it will be apples-to-apples resolved ever, considering https://github.com/huggingface/transformers/issues/18811#issuecomment-1262563941.
We need to be aware when comparing predictions of models (PT vs. TF) with stacks of interpolations, especially focusing on what tolerances we're using.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
System Info
Upsampling of intermediate feature maps is common for computer vision models. In PyTorch, it's usually implemented using
nn.functional.interpolate
. For TensorFlow, it's usuallytf.image.resize
.But there is an inconsistency between what these two methods yield (Colab Notebook). Sometimes the differences in their outputs are small enough to ignore (ViT, for example). But when interpolation is repeated several times in a model (MobileViT, for example), these small differences can add up and shake the final outputs of a model quite a bit. More details here.
@hollance wrote an amazing blog post discussing this issue.
Who can help?
@amyeroberts @gante @Rocketknight1
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
The Colab Notebook mentioned in the description.
Expected behavior
We should work on a TF utility that yields the same output as
nn.functional.interpolate
.