Closed LaurentMazare closed 1 year ago
Just adding that the same would be great for numpy
arrays. Implementing this without adding a dependency should be possible but we probably have to copy the tensors data 🤔
Arrow
(same in-memory format as used by huggingface datasets
) should be able to handle zero-copy array interoperability.
See arrow < -- > torch, arrow < -- > numpy, and arrow Tensor.
I don't think arrow would help sharing the data on gpu, but anyway I feel that we should just go ahead with making a copy to start with - this would already be better that the current way of doing the conversion.
Arrow has CUDA support, not sure how mature this is. Also, the docs on the C Array Interface discusses how NVIDIA's rapids
data science library -- GPU-accelerated dataframes and other scientific compute tools -- utilizes Arrow
under the hood.
Hi, I would like to help. As far as I know, I have to use libtorch
/tch-rs
in order to create a torch
tensor in Rust, copy data from candle.Tensor
to it, wrap it in PyObject
and return it to Python. But that would add dependency on libtorch
/tch-rs
?
Would you mind helping me get started on this?
I think we before thinking about converting candle tensors back to torch tensors we should first focus on converting torch tensors to candle tensors, or numpy arrays to candle tensors. I also think that these conversions would be possible without any additional dependencies. Converting the candle tensors back to torch/numpy probably is a bit more difficult and will probably need dependencies (I guess you have to have them installed either way as you have to import pytorch on the python side if you want to recieve a torch tensor as a result from a function). Any help on this would be greatly appreciated :D
Edit: @macroexpansion On a second thought, i don't think we have to add libtorch
as a dependency as we could use pyo3's python interop to call pytorchs python functions from the rust side and create the torch tensors that way🤔
+1 to not adding a libtorch
or tch
dependency. The idea here would rather be to use the python functions to get the datat as mentioned by @LLukas22 in the edit. It will make for additional copies but we should just go for something very simple to start with and then if it starts being used and end up being a bottleneck we will revisit (and probably the second step would be to have some numpy integration which is much more lightweight than libtorch
and will just require a couple memcpy to move the data around).
Thanks, I'll start working on it.
Closing this now that it actually exists.
It would be neat to support converting pytorch tensors to/from candle tensors when using the pytorch api, e.g. do something like this:
Ideally there should be no dependency on
torch
when runningcandle.Tensor([1, 2, 3])
after the change.Tagging @LLukas22 in case you have some thoughts about this or wanted to take a stab at it.