Closed davoclavo closed 1 year ago
As a corollary, it would be great to be able to pass multi-dimensional arrays to Tensor construction :D I might give it a shot soon.
Perhaps due to the creation of multiple Tensors (in order to work around this multi-dimensional Array issue) is uncovering some kind of race condition under the hood with javacpp-presets + libtorch
Just some additional information. I have also had a lot of trouble with storch tensors containing incorrect values, non-deterministically. ie. sometimes when I run the code it will work correctly, giving output matching pytorch, and other times it will return garbage. But I hadn't managed to create a nice minimal (semi-)reproducible example.
That's obviously quite bad. Thanks for looking into it @davoclavo, I'll try to investigate as well.
I've seen garbarge a few times as well when playing with the API interactively in the REPL, but like @darrenjw I haven't been able to reliably reproduce it either.
I think this is also a good reason to improve the test coverage.
Ok I think I have a hunch. It could be that when calling from_blob
, it does not copy the buffer, and then when the array/buffer is freed, we end up with garbage. If we clone the native tensor, it seems to be working, at least in my little test:
Tensor(
torchNative
.from_blob(
pointer,
Array(data.length.toLong),
NativeConverters.tensorOptions(inputDType, layout, CPU, requiresGrad)
- )
+ ).clone()
case object WeirdTest {
val range = 1L.to(10_000L).toSeq
val tensors = 1.to(100).map { _ =>
Tensor(range)
}
def main(args: Array[String]): Unit = {
println(tensors)
tensors.zipWithIndex.foreach { (tensor, index) =>
println(s"$index: $tensor")
assert(tensor.toSeq == range)
}
}
}
@sbrunk @darrenjw Thanks for your input!
I tried adding .clone()
and it seems it has fixed the problem! I also wrote a unit test where I was able to consistently replicate this issue, and validate that it was fixed. I can submit a PR soon with this change.
Hello! I have been enjoying implementing a couple of different models using storch, however after getting into more complicated stuff I stumbled upon a weird behavior where it seems Tensors ended up having corrupted data. Apparently it happens more often when I am creating a large amount of Tensors from Scala values, but I am still not sure why/how it is happening. While I investigate, I wanted to share a simple scenario to replicate the bug:
NOTE: All tensors should have numbers from 1 to 10000 as int64, but there are some with some really large values.
If I reduce the amount to 1000, it generates the Tensors properly:
Extra information: