Open ysiraichi opened 3 months ago
seems like during mark_step we found a XLATensor with empty data handle
As far as I have investigated, the only fallback we are running differently is aten::_local_scalar_dense
.
_local_scalar_dense
should be run on CPU I guess? This op usually happens when we move the tensor to CPU for print.
Right. But I wonder whether this issue sheds light into a CUDA OpenXLA fallback implementation issue. In the sense that, even if we run that on CUDA, it should still work.
This is odd. I tried replacing the DLPack conversion with tensor.to("cpu").to("cuda")
and tensor.to("cpu").to("xla")
, and still got the same error.
Forcing CPU fallback on _local_scalar_dense
did work, though.
@JackCaoG I have been debugging this for a while, now. And here's what I found out:
PjRtData
that was deleted is not the same as the fallback input holds. It was created in a later mark_step()
callPjRtData
instantiation: it is first instantiated by a CreateDataPlaceholder
call, inside ExtractIRAndPrepareXlaData_
function, (as far as I understand) when mark_step()
is called.PjRtStreamExecutorBuffer
deletion: Delete
calls Release
after RunPostOrder
finishes. That said, I believe that, at that point, the buffer is already deleted (i.e. PjRtStreamExecutorBuffer::IsDeleted() == true
). The reason being that PjRtStreamExecutorBuffer::ConfirmDonation
is called before.Basically, this is the timeline I am seeing:
...
CreateDataPlaceholder(tensor: 0x55a254171e70)
XLAData (ptr: 0x55a254142e60):
Data Device: CUDA:0
Data Shape: s64[1]
Data Handle: None
...
PjRtData::Assign: Handle changes from None to 0x7fecfc0710a0
>> Old: XLAData (0x55a254142e60):
Data Device: CUDA:0
Data Shape: s64[1]
Data Handle: None
>> New: XLAData (0x7fecfc677340):
Data Device: CUDA:0
Data Shape: s64[1]
Data Handle: 0x7fecfc0710a0
...
PjRtStreamExecutorBuffer::GetBufferWithHold(Usage): 0x7fecfc0710a0
...
PjRtStreamExecutorBuffer::GetBufferWithHold(Donation): 0x7fecfc0710a0
...
PjRtStreamExecutorBuffer::ConfirmDonation: 0x7fecfc0710a0
>> Resets the buffer, i.e. deletes it!
...
Could NOT get handle (0x55a254142e60): XLAData:
Data Device: CUDA:0
Data Shape: s64[1]
Data Handle: Deleted
PjRtStreamExecutorBuffer::Delete: 0x7fecfc0710a0
>> Delete is called, but buffer is already deleted, i.e. `PjRtStreamExecutorBuffer::device_buffer_ == nullptr`
...
Do you see anything strange? Any ideas of where to look at?
In an external discussion, we decided to work around this issue for now by forcing aten::_local_scalar_dense
to be run on CPU. Since this isn't exactly fixed (i.e. it may actually be the symptom of a more complex hidden error), I won't close this issue.
🐛 Bug
Running the upstreamed benchmarking scripts with the following command results in an unexpected error. It does work when using CPU OpenXLA fallback, though.
Environment
cc @miladm @JackCaoG