tonybaloney / CSnakes

https://tonybaloney.github.io/CSnakes/
MIT License
340 stars 25 forks source link

HuggingFace Transformers and Diffusers as nuget packages #296

Open AshD opened 1 month ago

AshD commented 1 month ago

CSnakes looks very impressive after seeing the Sneak Peak video https://youtu.be/U4-95gMT_UA

Is it possible to package HuggingFace Transformers and Diffusers as nuget packages that can be pulled into .NET projects with wrappers to call them from C#.

This should take care of a lot of Gen AI uses cases.

minesworld commented 3 weeks ago

Think that isn't within the scope of CSnakes but projects which use CSnakes.

Maybe those who use that can publish the sources for

Could be nicer, but depends how such code (in a nuget package) should be used. And: up to now there is only one virtual environment. That means:

as otherwise the nuget package might have some issues... And no nuget package provider wants to deal with that user feedback.

To be able the change that, CSnakes would have to split the PythonEnvironmentBuilder code into "usable" chunks, like VirtualEnvironment and pip classes that could be easily used without the builder context. Like from a nuget package whose "main" class would be given the path where the user wants to python stuff to be installed.

Multiple virtual environments could be created that way from C# and be used via a python wrapper which creates a subinterpreter for each one...

Solves one problem adds a new one: Python objects can not be passed between subinterpreters and such the result of one nuget package can not be used as the input of another nuget package without serialization/deserialization ...

tonybaloney commented 3 weeks ago

We might tackle this, see discussion in #270 so far.

AshD commented 3 weeks ago

Thanks for the replies.

I added Transformers and Diffusers to a .net 9 Console project using CSnakes and while it works, it needs a separate python env as @minesworld suggested, for the Transformers and Diffusers to make sure they don't break each other dependencies.

To explain my use case in more detail, Fusion Quill is a Windows WPF app that support multiple AI providers including Local providers using Llama.cpp GGUF and Onnx models. I wanted to add support for it for Transformers and Diffusers library to it and CSnakes is a good way to integrate it. My idea is that to put the code into separate .NET libraries that have their own CSnakes python env, so they don't mess up each other. Also, I am considering open sourcing these libraries so that other .NET devs can use it.

I also looked at integrating with exl2 python library (https://github.com/turboderp/exllamav2) since it is one of the fastest but I think it requires async for parallel queries.
https://github.com/turboderp/exllamav2/blob/master/examples/inference_async.py

Thanks @tonybaloney for CSnakes. Experience with it has be great so far.

tonybaloney commented 3 weeks ago

@AshD did you use the nuget locator or another one? I think it's possible to have a standalone package without any system dependencies using the nuget locator but it'll only be distributable on Windows. For Linux or Mac, the user will have to install separately. I'm looking into UV, which has a system for pulling a Python binary via API.

Regarding async, I'm looking into that to see if it were possible to await a Python coroutine, but in the interim you'd need to write a sync function as a wrapper.

FlatlinerDOA commented 3 weeks ago

@tonybaloney not to trivialise this but, wouldn't adding async "just" be a case of implementing INotifyCompletion or ICriticalNotifyCompletion?

tonybaloney commented 3 weeks ago

@tonybaloney not to trivialise this but, wouldn't adding async "just" be a case of implementing INotifyCompletion or ICriticalNotifyCompletion?

I wish. The .NET side is simple enough, the problem is that the Python coroutines need an event loop and there isn't a C-API for that.

We could wrap the asyncio library event loop in Python but the internals of it aren't easy to access.

AshD commented 3 weeks ago

@AshD did you use the nuget locator or another one? I think it's possible to have a standalone package without any system dependencies using the nuget locator but it'll only be distributable on Windows. For Linux or Mac, the user will have to install separately. I'm looking into UV, which has a system for pulling a Python binary via API.

I am using nuget locator for now. Will try to move it into a separate assembly and test it in the next few days.

Regarding async, I'm looking into that to see if it were possible to await a Python coroutine, but in the interim you'd need to write a sync function as a wrapper.

Need some guidance on how to do streaming of results back to .NET host when the async function is running inside a sync function python wrapper.

minesworld commented 3 weeks ago

I wish. The .NET side is simple enough, the problem is that the Python coroutines need an event loop and there isn't a C-API for that.

We could wrap the asyncio library event loop in Python but the internals of it aren't easy to access.

Isn't such a C-API only needed if both async IO "engines" would be mixed together? For those who might need that, it should be possible to use Pipes, sockets or whatever both sides can use from the operating system...

For the "rest of us" running C# async Tasks and CPython async loop independend from each other should work. and be sufficient.

Wrapping like https://github.com/tonybaloney/CSnakes/issues/307.

We would need "only" Callbacks fom CPython to C# ...

Besides that: having a class/module which would provide a virtual socket between C# and CPython would be nice to have.

AshD commented 3 weeks ago

I published a preliminary project for HF Diffusers and Transformers integration. There are two library projects for them, each with it's own python env. On their own they work fine! https://github.com/AshD/CSnakesIntegrations

I am having trouble getting two python envs when using them because it is registered as a singleton. https://github.com/AshD/CSnakesIntegrations/blob/main/CSnakesIntegrations/Program.cs