Closed ivsanro1 closed 1 year ago
For some reason after the first error, it does not fail anymore. With this it works:
for dev in self.config.device_map.get_layers_devs():
device_buffers = {}
self.buffers.append(device_buffers)
try:
torch.zeros((5, 5), dtype = torch.float16, device = dev)
except:
pass
temp_state = torch.zeros((config.max_input_len, config.intermediate_size), dtype = torch.float16, device = dev)
temp_mlp = torch.zeros((config.fused_mlp_thd * 2, config.intermediate_size), dtype = torch.float16, device = dev)
temp_zeros_float = torch.zeros((1, 65536), dtype = torch.float32, device = dev)
temp_dq = torch.zeros((1, max_dq_buffer_size), dtype = torch.float16, device = dev)
However, it still fails with the same error in other places of the code.
Compute capability of T4 is 7.5 so I imagine it should work
Closing because it's not exllama related
Hello. I have run exllama fine in a NVIDIA L4, but now I'm trying to run the same in a Tesla T4 and I receive the error:
AFAIU, T4 has proper fp16 support. I say because of this in the README:
I am developing on an RTX 4090 and an RTX 3090-Ti. 30-series and later NVIDIA GPUs should be well supported, but anything Pascal or older with poor FP16 support isn't going to perform well.
Do you know if exllama should work in a T4?
My versions: