huggingface / candle

Minimalist ML framework for Rust
Apache License 2.0
15.85k stars 956 forks source link

Convert pytorch tensors to/from candle tensors in the python api #1087

Closed LaurentMazare closed 1 year ago

LaurentMazare commented 1 year ago

It would be neat to support converting pytorch tensors to/from candle tensors when using the pytorch api, e.g. do something like this:

import torch
import candle

t = torch.randn((10, 8))
t2 = candle.Tensor(t)
print(t2)

t3 = t.to_torch() # returns a torch tensor.

Ideally there should be no dependency on torch when running candle.Tensor([1, 2, 3]) after the change.

Tagging @LLukas22 in case you have some thoughts about this or wanted to take a stab at it.

LLukas22 commented 1 year ago

Just adding that the same would be great for numpy arrays. Implementing this without adding a dependency should be possible but we probably have to copy the tensors data 🤔

jeromeku commented 1 year ago

Arrow (same in-memory format as used by huggingface datasets) should be able to handle zero-copy array interoperability.

See arrow < -- > torch, arrow < -- > numpy, and arrow Tensor.

LaurentMazare commented 1 year ago

I don't think arrow would help sharing the data on gpu, but anyway I feel that we should just go ahead with making a copy to start with - this would already be better that the current way of doing the conversion.

jeromeku commented 1 year ago

Arrow has CUDA support, not sure how mature this is. Also, the docs on the C Array Interface discusses how NVIDIA's rapids data science library -- GPU-accelerated dataframes and other scientific compute tools -- utilizes Arrow under the hood.

macroexpansion commented 1 year ago

Hi, I would like to help. As far as I know, I have to use libtorch/tch-rs in order to create a torch tensor in Rust, copy data from candle.Tensor to it, wrap it in PyObject and return it to Python. But that would add dependency on libtorch/tch-rs? Would you mind helping me get started on this?

LLukas22 commented 1 year ago

I think we before thinking about converting candle tensors back to torch tensors we should first focus on converting torch tensors to candle tensors, or numpy arrays to candle tensors. I also think that these conversions would be possible without any additional dependencies. Converting the candle tensors back to torch/numpy probably is a bit more difficult and will probably need dependencies (I guess you have to have them installed either way as you have to import pytorch on the python side if you want to recieve a torch tensor as a result from a function). Any help on this would be greatly appreciated :D

Edit: @macroexpansion On a second thought, i don't think we have to add libtorch as a dependency as we could use pyo3's python interop to call pytorchs python functions from the rust side and create the torch tensors that way🤔

LaurentMazare commented 1 year ago

+1 to not adding a libtorch or tch dependency. The idea here would rather be to use the python functions to get the datat as mentioned by @LLukas22 in the edit. It will make for additional copies but we should just go for something very simple to start with and then if it starts being used and end up being a bottleneck we will revisit (and probably the second step would be to have some numpy integration which is much more lightweight than libtorch and will just require a couple memcpy to move the data around).

macroexpansion commented 1 year ago

Thanks, I'll start working on it.

LaurentMazare commented 1 year ago

Closing this now that it actually exists.