Open miquelflorensa opened 1 month ago
@miquelflorensa Thank you for pointing that out. I thought the code is able to handle such a case. Could you please share your model details? If you find a solution, please don't hesitate to create a PR to fix it.
@lhnguyen102 I have experienced the issue with a FNN with 785 inputs in CUDA (when I use 788 it works fine). I still need to see exactly in which situations this error happens or if I did something wrong from my side. Whenever I find the issue I'll update this issue.
Specific Architecture:
FNN = Sequential(
Linear(28 * 28 + 1, 6000),
ReLU(),
Linear(6000, 28 * 28),
)
For any batch size.
@miquelflorensa I recently optimized the CUDA kernels for the the memory access pattern where it is preferable to have a vector size = multiple of PACK_SIZE
such as 4. It allows accessing 4 elements at once, leading a faster performance. Normally, there is a dispatch mechanism to switch back to the not-optimized kernels for such as your cases. I'll double check because there might be a bug there.
@lhnguyen102 I'm still not sure why it crashes in my case or under what exact circumstances. I tested it with inputs like 13 for the Boston Housing dataset, and it worked fine, so it’s possible I’m missing something, but I haven’t identified it yet. It’s not urgent to find the source of the problem, but I'll update the thread once I do.
@lhnguyen102 When using an input layer with size different to multiple of PACK_SIZE the Kernel crashes by
misaligned address
. I only experienced this issue in while running on GPU.I guess some modifications needs to be done in the
set_buffer_size
function:I will try to solve it by myself, but I wanted to leave the issue here.