shaltielshmid / TorchSharp.PyBridge

A library enabling easy transfer and handling of PyTorch models between .NET and Python environments
MIT License
14 stars 3 forks source link

Please make SafeTensors class public #15

Closed BalashovK closed 3 months ago

BalashovK commented 3 months ago

Or pickler/unpickler. Or both. Please! Python allows saving a loading tensors without a model. Imposing hard requirement to have a model for tensor I/O in C# looks artificial.

My use-case: I use TorchSharp for image and 3D volume processing. I do NOT create a model most of the time. Even if I do, most of my data is in tensors outside of the model. I often need to transfer tensors between C# and python code. Being able to save and load tensors in Python-compatible safetensors (and/or pickle) format would be great!

Thank you!

shaltielshmid commented 3 months ago

Sure! Can you give me an example of the python code and the ideal equivalent C# code?

BalashovK commented 3 months ago

Sure!

Python code which creates a safetensors file:

import torch from safetensors.torch import save_file arr = torch.tensor([1, 2, 3, 4, 5, 6]) arr_2d = arr.clone().reshape(2, 3) tensors = { "t_1d": arr, "t_2d": arr_2d } save_file(tensors, "made_by_python.safetensors")

c# code which loads and re-saved safetensors:

using TorchSharp; using static TorchSharp.torch;

namespace SafeTensorsIO { internal class Program { static void Main(string[] args) { Console.WriteLine("Hello, World!");

        string input_file = @"C:\SRC_Python_2\try_safetensors\made_by_python.safetensors";
        string output_file = @"C:\SRC_Python_2\try_safetensors\made_by_csharp.safetensors";

        Dictionary<string, torch.Tensor> tensors = TorchSharp.PyBridge.Safetensors.LoadStateDict(input_file);

        foreach ((string name, TorchSharp.torch.Tensor tensor) in tensors)
        {
            Console.WriteLine(name);
            Console.WriteLine(tensor);
            // also print content of the tensor
            long[] tensor_data = (long[])tensor.data<long>().ToArray();
            Console.WriteLine(string.Join(", ", tensor_data));
        }

        Dictionary<string,Tensor> output_tensors = new Dictionary<string, Tensor>();

        output_tensors["o_1d"] = torch.tensor(new long[] { 1, 2, 3, 4, 5, 6, 7, 8 });
        output_tensors["o_2d"] = torch.tensor(new long[,] { { 1, 2, 3 }, { 4, 5, 6 }, { 7, 8, 9 } });

        TorchSharp.PyBridge.Safetensors.SaveStateDict(output_file, output_tensors);
    }
}

}

Python code which reads safetensors saved by c#:

import torch from safetensors.torch import load_file fn = "made_by_csharp.safetensors" tensors = load_file(fn) print(tensors)

shaltielshmid commented 3 months ago

Sounds good! Will be in in the next release

shaltielshmid commented 3 months ago

Feature should be added in: https://github.com/shaltielshmid/TorchSharp.PyBridge/commit/6bc9b399a0651d41076e67e741ddd7640c95ef62

Published in Version 1.4.0, should be up in a few minutes

BalashovK commented 3 months ago

Thank you, I confirm that the 1.4.0 nuget package safetensors I/O works

shaltielshmid commented 3 months ago

I'll add that I also exposed the Unpickler/Pickler classes, so you can use that as well