Tensorflow: JIT compilation failed

Hi, I reinstalled a1111 on my new PC, and the WD 1.4 tagger is not working properly with local tagger models, when I try to run, for example deepdanbooru-v3-20211112-sgd-e28, I get this error
Loaded wd vit tagger v3 model from SmilingWolf/wd-vit-tagger-v3
Scanning <DirEntry 'deepdanbooru-v3-20211112-sgd-e28'> as deepdanbooru project
Scanning <DirEntry 'deepdanbooru-v4-20200814-sgd-e30'> as deepdanbooru project
Loading deepdanbooru-v3-20211112-sgd-e28 from <DirEntry 'deepdanbooru-v3-20211112-sgd-e28'>
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1722036149.226920   95507 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1722036149.257225   95507 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1722036149.261108   95507 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1722036149.281835   95507 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1722036149.285552   95507 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1722036149.289192   95507 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1722036149.293599   95507 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1722036149.297427   95507 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1722036149.301091   95507 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-07-26 20:22:29.304707: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9682 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3060, pci bus id: 0000:08:00.0, compute capability: 8.6
Loaded deepdanbooru-v3-20211112-sgd-e28 model from <DirEntry 'deepdanbooru-v3-20211112-sgd-e28'>
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
W0000 00:00:1722036155.068429   96511 gpu_backend_lib.cc:593] Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice.
Searched for CUDA in the following directories:
  ./cuda_sdk_lib
  /usr/local/cuda-12.3
  /usr/local/cuda
  /mnt/TS512GM/SD/stable-diffusion-webui-reForge/venv/lib/python3.10/site-packages/tensorflow/python/platform/../../../nvidia/cuda_nvcc
  /mnt/TS512GM/SD/stable-diffusion-webui-reForge/venv/lib/python3.10/site-packages/tensorflow/python/platform/../../../../nvidia/cuda_nvcc
  .
You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions.  For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.
W0000 00:00:1722036155.083700   96507 gpu_kernel_to_blob_pass.cc:190] Failed to compile generated PTX with ptxas. Falling back to compilation by driver.
W0000 00:00:1722036155.084477   96513 gpu_kernel_to_blob_pass.cc:190] Failed to compile generated PTX with ptxas. Falling back to compilation by driver.
W0000 00:00:1722036155.085463   96514 gpu_kernel_to_blob_pass.cc:190] Failed to compile generated PTX with ptxas. Falling back to compilation by driver.
W0000 00:00:1722036155.086461   96515 gpu_kernel_to_blob_pass.cc:190] Failed to compile generated PTX with ptxas. Falling back to compilation by driver.
W0000 00:00:1722036155.087440   96512 gpu_kernel_to_blob_pass.cc:190] Failed to compile generated PTX with ptxas. Falling back to compilation by driver.
W0000 00:00:1722036155.088229   96510 gpu_kernel_to_blob_pass.cc:190] Failed to compile generated PTX with ptxas. Falling back to compilation by driver.
W0000 00:00:1722036155.089131   96511 gpu_kernel_to_blob_pass.cc:190] Failed to compile generated PTX with ptxas. Falling back to compilation by driver.
W0000 00:00:1722036155.089870   96509 gpu_kernel_to_blob_pass.cc:190] Failed to compile generated PTX with ptxas. Falling back to compilation by driver.
W0000 00:00:1722036155.242013   95507 gpu_backend_lib.cc:631] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
error: libdevice not found at ./libdevice.10.bc
2024-07-26 20:22:35.242188: E tensorflow/compiler/mlir/tools/kernel_gen/tf_framework_c_interface.cc:228] INTERNAL: Generating device code failed.
2024-07-26 20:22:35.242597: W tensorflow/core/framework/op_kernel.cc:1828] UNKNOWN: JIT compilation failed.
2024-07-26 20:22:35.242617: I tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: UNKNOWN: JIT compilation failed.
*** Error completing request
*** Arguments: (<PIL.Image.Image image mode=RGB size=850x634 at 0x775E7CA08970>, 'deepdanbooru-v3-20211112-sgd-e28', 'hair', '', '', '', '', '') {}
    Traceback (most recent call last):
      File "/mnt/TS512GM/SD/stable-diffusion-webui-reForge/modules/call_queue.py", line 74, in f
        res = list(func(*args, **kwargs))
      File "/mnt/TS512GM/SD/stable-diffusion-webui-reForge/modules/call_queue.py", line 53, in f
        res = func(*args, **kwargs)
      File "/mnt/TS512GM/SD/stable-diffusion-webui-reForge/modules/call_queue.py", line 37, in f
        res = func(*args, **kwargs)
      File "/mnt/TS512GM/SD/stable-diffusion-webui-reForge/extensions/stable-diffusion-webui-wd14-tagger/tagger/ui.py", line 113, in on_interrogate_image_submit
        interrogator.interrogate_image(image)
      File "/mnt/TS512GM/SD/stable-diffusion-webui-reForge/extensions/stable-diffusion-webui-wd14-tagger/tagger/interrogator.py", line 150, in interrogate_image
        data = ('', '', fi_key) + self.interrogate(image)
      File "/mnt/TS512GM/SD/stable-diffusion-webui-reForge/extensions/stable-diffusion-webui-wd14-tagger/tagger/interrogator.py", line 309, in interrogate
        image = ddd.load_image_for_evaluate(
      File "/mnt/TS512GM/SD/stable-diffusion-webui-reForge/venv/lib/python3.10/site-packages/deepdanbooru/data/__init__.py", line 26, in load_image_for_evaluate
        image = tf.image.resize(
      File "/mnt/TS512GM/SD/stable-diffusion-webui-reForge/venv/lib/python3.10/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
        raise e.with_traceback(filtered_tb) from None
      File "/mnt/TS512GM/SD/stable-diffusion-webui-reForge/venv/lib/python3.10/site-packages/tensorflow/python/framework/ops.py", line 5983, in raise_from_not_ok_status
        raise core._status_to_exception(e) from None  # pylint: disable=protected-access
    tensorflow.python.framework.errors_impl.UnknownError: {{function_node __wrapped__Round_device_/job:localhost/replica:0/task:0/device:GPU:0}} JIT compilation failed. [Op:Round] name: 

---
Traceback (most recent call last):
  File "/mnt/TS512GM/SD/stable-diffusion-webui-reForge/venv/lib/python3.10/site-packages/gradio/routes.py", line 488, in run_predict
    output = await app.get_blocks().process_api(
  File "/mnt/TS512GM/SD/stable-diffusion-webui-reForge/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1434, in process_api
    data = self.postprocess_data(fn_index, result["prediction"], state)
  File "/mnt/TS512GM/SD/stable-diffusion-webui-reForge/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1297, in postprocess_data
    self.validate_outputs(fn_index, predictions)  # type: ignore
  File "/mnt/TS512GM/SD/stable-diffusion-webui-reForge/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1272, in validate_outputs
    raise ValueError(
ValueError: An event handler (on_interrogate_image_submit) didn't receive enough output values (needed: 7, received: 3).
Wanted outputs:
    [state, html, html, label, label, label, html]
Received outputs:
    [None, "", "<div class='error'>UnknownError: {{function_node __wrapped__Round_device_/job:localhost/replica:0/task:0/device:GPU:0}} JIT compilation failed. [Op:Round] name: </div><div class='performance'><p class='time'>Time taken: <wbr><span class='measurement'>6.2 sec.</span></p><p class='vram'><abbr title='Active: peak amount of video memory used during generation (excluding cached data)'>A</abbr>: <span class='measurement'>0.24 GB</span>, <wbr><abbr title='Reserved: total amount of video memory allocated by the Torch library '>R</abbr>: <span class='measurement'>0.50 GB</span>, <wbr><abbr title='System: peak amount of video memory allocated by all running programs, out of total capacity'>Sys</abbr>: <span class='measurement'>2.7/11.667 GB</span> (23.2%)</p></div>"]
picobyte / stable-diffusion-webui-wd14-tagger

Tensorflow: JIT compilation failed #114