AssertError: pt_pred not close to tf_pred

Environment:
Ubuntu 18.04
PyTorch 1.7.1
TensorFlow 2.4.1
torch_audioset commit at 2020.06.26: https://github.com/w-hc/torch_audioset/tree/42d38c175505a47660b07de39189f5483d89845d
yamnet commit at 2020.02.27: https://github.com/tensorflow/models/tree/83f56818af03cf70ec8e99d9c8c955ca98dead6e
CUDA_VISIBLE_DEVICES= python convert_yamnet.py 
2021-03-26 17:33:08.960500: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-03-26 17:33:10.344642: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-03-26 17:33:10.345393: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-03-26 17:33:10.376690: E tensorflow/stream_executor/cuda/cuda_driver.cc:328] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2021-03-26 17:33:10.376728: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: f89b1701d83f
2021-03-26 17:33:10.376735: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: f89b1701d83f
2021-03-26 17:33:10.376815: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 460.39.0
2021-03-26 17:33:10.376837: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 460.39.0
2021-03-26 17:33:10.376844: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 460.39.0
2021-03-26 17:33:10.377093: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-03-26 17:33:10.377559: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-03-26 17:33:10.380729: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes)
2021-03-26 17:33:10.383577: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 3499910000 Hz
WARNING:tensorflow:When passing input data as arrays, do not specify `steps_per_epoch`/`steps` argument. Please use `batch_size` instead.
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:2325: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
  warnings.warn('`Model.state_updates` will be removed in a future version. '
matching layer1.fused.conv.weight                           <--->     layer1/conv/layer1/conv/kernel:0
matching layer1.fused.bn.bias                               <--->     layer1/conv/bn/layer1/conv/bn/beta:0
matching layer1.fused.bn.running_mean                       <--->     layer1/conv/bn/layer1/conv/bn/moving_mean:0
matching layer1.fused.bn.running_var                        <--->     layer1/conv/bn/layer1/conv/bn/moving_variance:0
matching layer2.depthwise_conv.conv.weight                  <--->     layer2/depthwise_conv/layer2/depthwise_conv/depthwise_kernel:0
matching layer2.depthwise_conv.bn.bias                      <--->     layer2/depthwise_conv/bn/layer2/depthwise_conv/bn/beta:0
matching layer2.depthwise_conv.bn.running_mean              <--->     layer2/depthwise_conv/bn/layer2/depthwise_conv/bn/moving_mean:0
matching layer2.depthwise_conv.bn.running_var               <--->     layer2/depthwise_conv/bn/layer2/depthwise_conv/bn/moving_variance:0
matching layer2.pointwise_conv.conv.weight                  <--->     layer2/pointwise_conv/layer2/pointwise_conv/kernel:0
matching layer2.pointwise_conv.bn.bias                      <--->     layer2/pointwise_conv/bn/layer2/pointwise_conv/bn/beta:0
matching layer2.pointwise_conv.bn.running_mean              <--->     layer2/pointwise_conv/bn/layer2/pointwise_conv/bn/moving_mean:0
matching layer2.pointwise_conv.bn.running_var               <--->     layer2/pointwise_conv/bn/layer2/pointwise_conv/bn/moving_variance:0
matching layer3.depthwise_conv.conv.weight                  <--->     layer3/depthwise_conv/layer3/depthwise_conv/depthwise_kernel:0
matching layer3.depthwise_conv.bn.bias                      <--->     layer3/depthwise_conv/bn/layer3/depthwise_conv/bn/beta:0
matching layer3.depthwise_conv.bn.running_mean              <--->     layer3/depthwise_conv/bn/layer3/depthwise_conv/bn/moving_mean:0
matching layer3.depthwise_conv.bn.running_var               <--->     layer3/depthwise_conv/bn/layer3/depthwise_conv/bn/moving_variance:0
matching layer3.pointwise_conv.conv.weight                  <--->     layer3/pointwise_conv/layer3/pointwise_conv/kernel:0
matching layer3.pointwise_conv.bn.bias                      <--->     layer3/pointwise_conv/bn/layer3/pointwise_conv/bn/beta:0
matching layer3.pointwise_conv.bn.running_mean              <--->     layer3/pointwise_conv/bn/layer3/pointwise_conv/bn/moving_mean:0
matching layer3.pointwise_conv.bn.running_var               <--->     layer3/pointwise_conv/bn/layer3/pointwise_conv/bn/moving_variance:0
matching layer4.depthwise_conv.conv.weight                  <--->     layer4/depthwise_conv/layer4/depthwise_conv/depthwise_kernel:0
matching layer4.depthwise_conv.bn.bias                      <--->     layer4/depthwise_conv/bn/layer4/depthwise_conv/bn/beta:0
matching layer4.depthwise_conv.bn.running_mean              <--->     layer4/depthwise_conv/bn/layer4/depthwise_conv/bn/moving_mean:0
matching layer4.depthwise_conv.bn.running_var               <--->     layer4/depthwise_conv/bn/layer4/depthwise_conv/bn/moving_variance:0
matching layer4.pointwise_conv.conv.weight                  <--->     layer4/pointwise_conv/layer4/pointwise_conv/kernel:0
matching layer4.pointwise_conv.bn.bias                      <--->     layer4/pointwise_conv/bn/layer4/pointwise_conv/bn/beta:0
matching layer4.pointwise_conv.bn.running_mean              <--->     layer4/pointwise_conv/bn/layer4/pointwise_conv/bn/moving_mean:0
matching layer4.pointwise_conv.bn.running_var               <--->     layer4/pointwise_conv/bn/layer4/pointwise_conv/bn/moving_variance:0
matching layer5.depthwise_conv.conv.weight                  <--->     layer5/depthwise_conv/layer5/depthwise_conv/depthwise_kernel:0
matching layer5.depthwise_conv.bn.bias                      <--->     layer5/depthwise_conv/bn/layer5/depthwise_conv/bn/beta:0
matching layer5.depthwise_conv.bn.running_mean              <--->     layer5/depthwise_conv/bn/layer5/depthwise_conv/bn/moving_mean:0
matching layer5.depthwise_conv.bn.running_var               <--->     layer5/depthwise_conv/bn/layer5/depthwise_conv/bn/moving_variance:0
matching layer5.pointwise_conv.conv.weight                  <--->     layer5/pointwise_conv/layer5/pointwise_conv/kernel:0
matching layer5.pointwise_conv.bn.bias                      <--->     layer5/pointwise_conv/bn/layer5/pointwise_conv/bn/beta:0
matching layer5.pointwise_conv.bn.running_mean              <--->     layer5/pointwise_conv/bn/layer5/pointwise_conv/bn/moving_mean:0
matching layer5.pointwise_conv.bn.running_var               <--->     layer5/pointwise_conv/bn/layer5/pointwise_conv/bn/moving_variance:0
matching layer6.depthwise_conv.conv.weight                  <--->     layer6/depthwise_conv/layer6/depthwise_conv/depthwise_kernel:0
matching layer6.depthwise_conv.bn.bias                      <--->     layer6/depthwise_conv/bn/layer6/depthwise_conv/bn/beta:0
matching layer6.depthwise_conv.bn.running_mean              <--->     layer6/depthwise_conv/bn/layer6/depthwise_conv/bn/moving_mean:0
matching layer6.depthwise_conv.bn.running_var               <--->     layer6/depthwise_conv/bn/layer6/depthwise_conv/bn/moving_variance:0
matching layer6.pointwise_conv.conv.weight                  <--->     layer6/pointwise_conv/layer6/pointwise_conv/kernel:0
matching layer6.pointwise_conv.bn.bias                      <--->     layer6/pointwise_conv/bn/layer6/pointwise_conv/bn/beta:0
matching layer6.pointwise_conv.bn.running_mean              <--->     layer6/pointwise_conv/bn/layer6/pointwise_conv/bn/moving_mean:0
matching layer6.pointwise_conv.bn.running_var               <--->     layer6/pointwise_conv/bn/layer6/pointwise_conv/bn/moving_variance:0
matching layer7.depthwise_conv.conv.weight                  <--->     layer7/depthwise_conv/layer7/depthwise_conv/depthwise_kernel:0
matching layer7.depthwise_conv.bn.bias                      <--->     layer7/depthwise_conv/bn/layer7/depthwise_conv/bn/beta:0
matching layer7.depthwise_conv.bn.running_mean              <--->     layer7/depthwise_conv/bn/layer7/depthwise_conv/bn/moving_mean:0
matching layer7.depthwise_conv.bn.running_var               <--->     layer7/depthwise_conv/bn/layer7/depthwise_conv/bn/moving_variance:0
matching layer7.pointwise_conv.conv.weight                  <--->     layer7/pointwise_conv/layer7/pointwise_conv/kernel:0
matching layer7.pointwise_conv.bn.bias                      <--->     layer7/pointwise_conv/bn/layer7/pointwise_conv/bn/beta:0
matching layer7.pointwise_conv.bn.running_mean              <--->     layer7/pointwise_conv/bn/layer7/pointwise_conv/bn/moving_mean:0
matching layer7.pointwise_conv.bn.running_var               <--->     layer7/pointwise_conv/bn/layer7/pointwise_conv/bn/moving_variance:0
matching layer8.depthwise_conv.conv.weight                  <--->     layer8/depthwise_conv/layer8/depthwise_conv/depthwise_kernel:0
matching layer8.depthwise_conv.bn.bias                      <--->     layer8/depthwise_conv/bn/layer8/depthwise_conv/bn/beta:0
matching layer8.depthwise_conv.bn.running_mean              <--->     layer8/depthwise_conv/bn/layer8/depthwise_conv/bn/moving_mean:0
matching layer8.depthwise_conv.bn.running_var               <--->     layer8/depthwise_conv/bn/layer8/depthwise_conv/bn/moving_variance:0
matching layer8.pointwise_conv.conv.weight                  <--->     layer8/pointwise_conv/layer8/pointwise_conv/kernel:0
matching layer8.pointwise_conv.bn.bias                      <--->     layer8/pointwise_conv/bn/layer8/pointwise_conv/bn/beta:0
matching layer8.pointwise_conv.bn.running_mean              <--->     layer8/pointwise_conv/bn/layer8/pointwise_conv/bn/moving_mean:0
matching layer8.pointwise_conv.bn.running_var               <--->     layer8/pointwise_conv/bn/layer8/pointwise_conv/bn/moving_variance:0
matching layer9.depthwise_conv.conv.weight                  <--->     layer9/depthwise_conv/layer9/depthwise_conv/depthwise_kernel:0
matching layer9.depthwise_conv.bn.bias                      <--->     layer9/depthwise_conv/bn/layer9/depthwise_conv/bn/beta:0
matching layer9.depthwise_conv.bn.running_mean              <--->     layer9/depthwise_conv/bn/layer9/depthwise_conv/bn/moving_mean:0
matching layer9.depthwise_conv.bn.running_var               <--->     layer9/depthwise_conv/bn/layer9/depthwise_conv/bn/moving_variance:0
matching layer9.pointwise_conv.conv.weight                  <--->     layer9/pointwise_conv/layer9/pointwise_conv/kernel:0
matching layer9.pointwise_conv.bn.bias                      <--->     layer9/pointwise_conv/bn/layer9/pointwise_conv/bn/beta:0
matching layer9.pointwise_conv.bn.running_mean              <--->     layer9/pointwise_conv/bn/layer9/pointwise_conv/bn/moving_mean:0
matching layer9.pointwise_conv.bn.running_var               <--->     layer9/pointwise_conv/bn/layer9/pointwise_conv/bn/moving_variance:0
matching layer10.depthwise_conv.conv.weight                 <--->     layer10/depthwise_conv/layer10/depthwise_conv/depthwise_kernel:0
matching layer10.depthwise_conv.bn.bias                     <--->     layer10/depthwise_conv/bn/layer10/depthwise_conv/bn/beta:0
matching layer10.depthwise_conv.bn.running_mean             <--->     layer10/depthwise_conv/bn/layer10/depthwise_conv/bn/moving_mean:0
matching layer10.depthwise_conv.bn.running_var              <--->     layer10/depthwise_conv/bn/layer10/depthwise_conv/bn/moving_variance:0
matching layer10.pointwise_conv.conv.weight                 <--->     layer10/pointwise_conv/layer10/pointwise_conv/kernel:0
matching layer10.pointwise_conv.bn.bias                     <--->     layer10/pointwise_conv/bn/layer10/pointwise_conv/bn/beta:0
matching layer10.pointwise_conv.bn.running_mean             <--->     layer10/pointwise_conv/bn/layer10/pointwise_conv/bn/moving_mean:0
matching layer10.pointwise_conv.bn.running_var              <--->     layer10/pointwise_conv/bn/layer10/pointwise_conv/bn/moving_variance:0
matching layer11.depthwise_conv.conv.weight                 <--->     layer11/depthwise_conv/layer11/depthwise_conv/depthwise_kernel:0
matching layer11.depthwise_conv.bn.bias                     <--->     layer11/depthwise_conv/bn/layer11/depthwise_conv/bn/beta:0
matching layer11.depthwise_conv.bn.running_mean             <--->     layer11/depthwise_conv/bn/layer11/depthwise_conv/bn/moving_mean:0
matching layer11.depthwise_conv.bn.running_var              <--->     layer11/depthwise_conv/bn/layer11/depthwise_conv/bn/moving_variance:0
matching layer11.pointwise_conv.conv.weight                 <--->     layer11/pointwise_conv/layer11/pointwise_conv/kernel:0
matching layer11.pointwise_conv.bn.bias                     <--->     layer11/pointwise_conv/bn/layer11/pointwise_conv/bn/beta:0
matching layer11.pointwise_conv.bn.running_mean             <--->     layer11/pointwise_conv/bn/layer11/pointwise_conv/bn/moving_mean:0
matching layer11.pointwise_conv.bn.running_var              <--->     layer11/pointwise_conv/bn/layer11/pointwise_conv/bn/moving_variance:0
matching layer12.depthwise_conv.conv.weight                 <--->     layer12/depthwise_conv/layer12/depthwise_conv/depthwise_kernel:0
matching layer12.depthwise_conv.bn.bias                     <--->     layer12/depthwise_conv/bn/layer12/depthwise_conv/bn/beta:0
matching layer12.depthwise_conv.bn.running_mean             <--->     layer12/depthwise_conv/bn/layer12/depthwise_conv/bn/moving_mean:0
matching layer12.depthwise_conv.bn.running_var              <--->     layer12/depthwise_conv/bn/layer12/depthwise_conv/bn/moving_variance:0
matching layer12.pointwise_conv.conv.weight                 <--->     layer12/pointwise_conv/layer12/pointwise_conv/kernel:0
matching layer12.pointwise_conv.bn.bias                     <--->     layer12/pointwise_conv/bn/layer12/pointwise_conv/bn/beta:0
matching layer12.pointwise_conv.bn.running_mean             <--->     layer12/pointwise_conv/bn/layer12/pointwise_conv/bn/moving_mean:0
matching layer12.pointwise_conv.bn.running_var              <--->     layer12/pointwise_conv/bn/layer12/pointwise_conv/bn/moving_variance:0
matching layer13.depthwise_conv.conv.weight                 <--->     layer13/depthwise_conv/layer13/depthwise_conv/depthwise_kernel:0
matching layer13.depthwise_conv.bn.bias                     <--->     layer13/depthwise_conv/bn/layer13/depthwise_conv/bn/beta:0
matching layer13.depthwise_conv.bn.running_mean             <--->     layer13/depthwise_conv/bn/layer13/depthwise_conv/bn/moving_mean:0
matching layer13.depthwise_conv.bn.running_var              <--->     layer13/depthwise_conv/bn/layer13/depthwise_conv/bn/moving_variance:0
matching layer13.pointwise_conv.conv.weight                 <--->     layer13/pointwise_conv/layer13/pointwise_conv/kernel:0
matching layer13.pointwise_conv.bn.bias                     <--->     layer13/pointwise_conv/bn/layer13/pointwise_conv/bn/beta:0
matching layer13.pointwise_conv.bn.running_mean             <--->     layer13/pointwise_conv/bn/layer13/pointwise_conv/bn/moving_mean:0
matching layer13.pointwise_conv.bn.running_var              <--->     layer13/pointwise_conv/bn/layer13/pointwise_conv/bn/moving_variance:0
matching layer14.depthwise_conv.conv.weight                 <--->     layer14/depthwise_conv/layer14/depthwise_conv/depthwise_kernel:0
matching layer14.depthwise_conv.bn.bias                     <--->     layer14/depthwise_conv/bn/layer14/depthwise_conv/bn/beta:0
matching layer14.depthwise_conv.bn.running_mean             <--->     layer14/depthwise_conv/bn/layer14/depthwise_conv/bn/moving_mean:0
matching layer14.depthwise_conv.bn.running_var              <--->     layer14/depthwise_conv/bn/layer14/depthwise_conv/bn/moving_variance:0
matching layer14.pointwise_conv.conv.weight                 <--->     layer14/pointwise_conv/layer14/pointwise_conv/kernel:0
matching layer14.pointwise_conv.bn.bias                     <--->     layer14/pointwise_conv/bn/layer14/pointwise_conv/bn/beta:0
matching layer14.pointwise_conv.bn.running_mean             <--->     layer14/pointwise_conv/bn/layer14/pointwise_conv/bn/moving_mean:0
matching layer14.pointwise_conv.bn.running_var              <--->     layer14/pointwise_conv/bn/layer14/pointwise_conv/bn/moving_variance:0
matching classifier.weight                                  <--->     logits/logits/kernel:0
matching classifier.bias                                    <--->     logits/logits/bias:0
(5, 521)
(5, 521)
-42.8931 20.909971
0.0077692876 0.08557794
Traceback (most recent call last):
  File "convert_yamnet.py", line 156, in <module>
    main()
  File "convert_yamnet.py", line 150, in main
    assert np.allclose(pt_pred, tf_pred, atol=1e-6)
AssertionError
w-hc / torch_audioset

AssertError: pt_pred not close to tf_pred #3