huggingface / candle

Minimalist ML framework for Rust
Apache License 2.0
15.9k stars 963 forks source link

Model Wishlist #1177

Open LaurentMazare opened 1 year ago

LaurentMazare commented 1 year ago

This issue aims at keeping track of the models that would be interesting to get added to candle. Feel free to make a comment to mention a new model, or vote for a model already in the list.

Added recently:

phudtran commented 1 year ago

Decent images with 4 steps of inference

Apache licensed and has a fairly large community. Perhaps a minimal port as an example.

StyleTTS2 MIT licensed (Possibly better and faster than tortoise tts)

radames commented 1 year ago

The new JinaBert Embeddings is small and has Apache license

ToluClassics commented 1 year ago

The new JinaBert Embeddings is small and has Apache license

This looks like one I'd love to help migrate; @LaurentMazare I can create an issue and get started on this?

LaurentMazare commented 1 year ago

The new JinaBert Embeddings is small and has Apache license

This looks like one I'd love to help migrate; @LaurentMazare I can create an issue and get started on this?

Ah sorry actually I'm mostly done with it, see #1187 (though I still have to line up the embedding values properly but all the rest is in place) - I had a couple hours in the train this afternoon :)

ToluClassics commented 1 year ago

The new JinaBert Embeddings is small and has Apache license

This looks like one I'd love to help migrate; @LaurentMazare I can create an issue and get started on this?

Ah sorry actually I'm mostly done with it, see #1187 (though I still have to line up the embedding values properly but all the rest is in place) - I had a couple hours in the train this afternoon :)

Late to the party 😅, I'd keep an eye out for this list then.

LaurentMazare commented 1 year ago

@radames the jina-bert bits have been merged and I checked on some examples that the generated embeddings line up properly with the python version so should be all good. I will just polish the example a bit to properly download the tokenizer and weight files from the hub if needed. @ToluClassics and beside the list, if you have some models that you would feel interested by, that's certainly a great way to get started.

Liangdi commented 1 year ago

can support https://github.com/infinitered/nsfwjs or https://github.com/bhky/opennsfw2 for nsfw detection?

flutter-painter commented 1 year ago

Would it be possible to show how to use a marian translation model in candle ?

There is already an example in :

The modeling_marian.py modeling file is already available in transformers :

Marian translation models being lighter than their counterparts they are well-suited for serverless application. Candle being lighter than rust-bert and relying less on tch-rs I expect this would lighten and ease the whole build process.

LaurentMazare commented 1 year ago

@flutter-painter the marian-mt model should now be available in candle, we have an example that uses it to translate from french to english which you can find here.

flutter-painter commented 1 year ago

Thanks @LaurentMazare I just tested and it works. You are blazingly fast !

Liangdi commented 1 year ago

@flutter-painter the marian-mt model should now be available in candle, we have an example that uses it to translate from french to english which you can find here.

thanks for amaing works, but could you please tell me how to get tokenzier-marian-{lang}.json ? i try to get from python code

tokenizer = MarianTokenizer.from_pretrained(model_name)
tokenizer.save_pretrained("./out")

but it does not work. I noticed that there is that file "convert_slow_tokenizer.py", do I need to use the function of this file? how to use? thanks very much!

LaurentMazare commented 1 year ago

thanks for amaing works, but could you please tell me how to get tokenzier-marian-{lang}.json ? i try to get from python code

tokenizer = MarianTokenizer.from_pretrained(model_name)
tokenizer.save_pretrained("./out")

but it does not work. I noticed that there is that file "convert_slow_tokenizer.py", do I need to use the function of this file? how to use? thanks very much!

Ah that part is indeed very hacky at the moment. You have to download this convert_slow_tokenizer.py script that you discovered and from the same directory running the following python code shoud produce the two tokenizer files.

from convert_slow_tokenizer import MarianConverter
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-fr-en", use_fast=False)
fast_tokenizer = MarianConverter(tokenizer, index=0).converted()
fast_tokenizer.save(f"tokenizer-marian-base-fr.json")
fast_tokenizer = MarianConverter(tokenizer, index=1).converted()
fast_tokenizer.save(f"tokenizer-marian-base-en.json")
Liangdi commented 1 year ago

ChatGLM3 huggingface : https://huggingface.co/THUDM/chatglm3-6b

LaurentMazare commented 1 year ago

ChatGLM3 huggingface : https://huggingface.co/THUDM/chatglm3-6b

That sounds like a great model to have, happy to prioritize it. Do you know if the tokenizer config can be easily converted to a working tokenizer.json or maybe there is already such a config somewhere? (similar to what was done for marian but hopefully less flaky)

EmilLindfors commented 1 year ago

How are we doing on TTS? Would be nice to have e.g. Bark.

Also, +1 on nougat

Liangdi commented 1 year ago

anather embeddings model https://huggingface.co/moka-ai/m3e-large

LaurentMazare commented 1 year ago

anather embeddings model https://huggingface.co/moka-ai/m3e-large

I think this one may just work out of the box as it uses a standard bert model which already has been added. You could try it out with the following:

cargo run --example bert -- --model-id moka-ai/m3e-large --revision refs/pr/5
Liangdi commented 1 year ago

anather embeddings model https://huggingface.co/moka-ai/m3e-large

I think this one may just work out of the box as it uses a standard bert model which already has been added. You could try it out with the following:

cargo run --example bert -- --model-id moka-ai/m3e-large --revision refs/pr/5

it works , thank you!

Liangdi commented 1 year ago

@LaurentMazare another LLM https://github.com/01-ai/Yi , they coverted the tokeninzer.json , i tested with Tokenizers , https://github.com/01-ai/Yi/issues/24#issuecomment-1801680600 , can candle support this Model?

LaurentMazare commented 1 year ago

@LaurentMazare another LLM https://github.com/01-ai/Yi , they coverted the tokeninzer.json , i tested with Tokenizers , 01-ai/Yi#24 (comment) , can candle support this Model?

Do you know if this is the same tokenizer as for ChatGLM3? If that's the case I would prefer pushing on this one first as I have a PR that should be mostly done with the implementation and only requires lining up the logits once we have a proper tokenizer config.

(edit: I misremembered the PR, it's not mostly complete and requires a bit of work implementing the forward passes but this should be pretty quick to do once we have a tokenizer to test out)

LaurentMazare commented 1 year ago

@LaurentMazare another LLM https://github.com/01-ai/Yi , they coverted the tokeninzer.json , i tested with Tokenizers , 01-ai/Yi#24 (comment) , can candle support this Model?

Just merged support for the yi-6b and yi-34b variants in #1320 , I haven't tested them much though as my current computer is very slow even on the 6b - not sure how much of that is expected. It would certainly be great if you can give these a spin and let me know how it goes.

Liangdi commented 1 year ago

@LaurentMazare another LLM https://github.com/01-ai/Yi , they coverted the tokeninzer.json , i tested with Tokenizers , 01-ai/Yi#24 (comment) , can candle support this Model?

Just merged support for the yi-6b and yi-34b variants in #1320 , I haven't tested them much though as my current computer is very slow even on the 6b - not sure how much of that is expected. It would certainly be great if you can give these a spin and let me know how it goes.

thank you very much, i'll do some testing and i'll follow your pr , try to convert other LLMs

YeonwooSung commented 1 year ago

Just for curiosity, is there any plan for supporting more embedding models such as deberta or BAAI/bge-large-en-v1.5?

julien-blanchon commented 1 year ago

Hey :wave:, is someone working on porting LCM to Candle ?

LaurentMazare commented 1 year ago

Just for curiosity, is there any plan for supporting more embedding models such as deberta or BAAI/bge-large-en-v1.5?

As this seems to just be a bert variant, this could work directly with the current bert model provided by candle.

cargo run --example bert --release -- --model-id BAAI/bge-large-en-v1.5 --revision refs/pr/5
LaurentMazare commented 1 year ago

Hey 👋, is someone working on porting LCM to Candle ?

Could you provide some links/details on what LCM is?

julien-blanchon commented 1 year ago

Hey 👋, is someone working on porting LCM to Candle ?

Could you provide some links/details on what LCM is?

Yep, I opened an dedicated issue here

radames commented 11 months ago

EfficientSAM: https://github.com/xetdata/EfficientSAM

LaurentMazare commented 8 months ago

Re text-to-speech, we've just added an early version of metavoice, you can try it out via this example.

LaurentMazare commented 8 months ago

Would anyone has interest for running moondream2 if it was added to candle? Looks like a small and efficient model that wouldn't be too hard to add.

julien-blanchon commented 8 months ago

Yes clearly !!! Expecially for WASM usage, I'm not sure it will fit in one single chunk for webgpu api

radames commented 8 months ago

Yes, @LaurentMazare, Moondream will be very cool! Perhaps we could also run it quantized, and it will also work on WASM ?

groovybits commented 8 months ago

The Dolphin Mixtral is the best one I have found for story generation and logical calculations + coding. Much better than base Mixtral for me, it uses chatML if that matters. (this runs fast on Metal, 10+ tps, when quantized to Q5_K_M and doesn't seem to lose ability to code decently in most cases). So I really love this model.

https://huggingface.co/TheBloke/dolphin-2.7-mixtral-8x7b-GGUF

Curious if this is more of a config/setup of it vs. needing support in Candle?

For some reason the quantized Mixtrals fail for my Metal build/runs, from the example. While the main mixtral example overwhelms my 192 gig Mac M2 Ultra (surprised, didn't think the model was that big?).

It's the only Mistral variant that I have ran which doesn't repeat when given large histories and lots of random stuff (run 24/7 twitch Ai chat that really can exercise a model for history + general stability over time when throwing stuff into it that could vary wildly).

Liangdi commented 8 months ago

how about YoloWorld-EfficientSAM

LaurentMazare commented 8 months ago

The Dolphin Mixtral is the best one I have found for story generation and logical calculations + coding. Much better than base Mixtral for me, it uses chatML if that matters. (this runs fast on Metal, 10+ tps, when quantized to Q5_K_M and doesn't seem to lose ability to code decently in most cases). So I really love this model.

https://huggingface.co/TheBloke/dolphin-2.7-mixtral-8x7b-GGUF

Curious if this is more of a config/setup of it vs. needing support in Candle?

For some reason the quantized Mixtrals fail for my Metal build/runs, from the example.

If you can provide details on what is going wrong here, that could be helpful as it would be a great model to support but sadly I only have 16G of memory on my mac so cannot really try it out and iron the issues.

LaurentMazare commented 7 months ago

Moondream2 is now available, kudos to @santiagomed . Try it out using this example: https://github.com/huggingface/candle/blob/main/candle-examples/examples/moondream/README.md

santiagomed commented 7 months ago

Yes, @LaurentMazare, Moondream will be very cool! Perhaps we could also run it quantized, and it will also work on WASM ?

Been talking with Vik (moondream maintainer) and he is working on quantizing Moondream. Planning to add the quantized version after it's out and would love to also take a shot at making a WASM version!

jorgeantonio21 commented 7 months ago

A 8b and 70b Llama3 versions have recently came out. Would love to see these available on candle !

https://ai.meta.com/blog/meta-llama-3/?utm_source=twitter&utm_medium=organic_social&utm_content=video&utm_campaign=llama3

EricLBuehler commented 7 months ago

@jorgeantonio21, we have this conversion in process on mistral.rs.

LaurentMazare commented 7 months ago

It will be included in candle when I get access to the weights (which sadly are behind a form).

julien-c commented 7 months ago

@LaurentMazare validation should be super fast on HF https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct

LaurentMazare commented 7 months ago

Thanks @julien-c indeed I got approved almost immediately there. Turns out that the model are almost drop in replacements for llama-2, so they could run with the old code for the model. I've made a few tweaks in #2085 though so that they're easier to try out. I haven't adapted the multi-GPU example but should do soon so that the large version could be run.

jorgeantonio21 commented 7 months ago

Thank you @LaurentMazare, @EricLBuehler ! I was able to run Llama3 with the previous code, but the results are a bit weird, tbh. Not sure, if this is a problem of the model itself, or the candle previous llama code.

EricLBuehler commented 7 months ago

@jorgeantonio21, did you have issues with the mistral.rs implementation?

LaurentMazare commented 7 months ago

@jorgeantonio21 could you provide details and maybe fill a specific issue? @EricLBuehler maybe we should focus on candle built in models in this github thread and not on other projects :)

EricLBuehler commented 7 months ago

@LaurentMazare, absolutely, sorry!

jorgeantonio21 commented 7 months ago

@LaurentMazare, I ran a few examples, but the following was particularly striking to me:

Input: The best thing about coding in rust is

Output:

use std;

macro_rules! assert_eq {
    ($left:expr , $right: expr) => (assert!(std::cmp::EQ($left, $right)));
}

fn main() {
    let a = false;
    assert_eq!(a == true);
}

And it doesn't work. We are defining our macro inside the use of std and that can be seen as defining it outside its scope like so:

mod foo {
      fn abc() {}
}

Then anything we put inside foo itself will have access to anything

I used parameters:

"prompt": "The best thing about coding in rust is ",
"temperature": 0.8,
"random_seed": 42,
"repeat_penalty": 1.1,
"repeat_last_n": 128,
"max_tokens": 32,
"_top_k": 10,
"top_p": 1.0
LaurentMazare commented 7 months ago

Interesting, one thing that we don't do by default at the moment is including the bos token at the beginning of the prompt (which the official implementation does), if I try running the following I get quite better results than without.

$ target/release-with-debug/examples/llama --prompt '<|begin_of_text|> The best thing about coding in rust is ' --temperature 0.8 --seed 42 --repeat-penalty 1.1 --repeat-last-n 128 --which v3
loading the model weights from meta-llama/Meta-Llama-3-8B
starting the inference loop
<|begin_of_text|> The best thing about coding in rust is  the ability to create instances of structs on heap, which can be extremely useful when creating a large
 tree-like structure. Since we know that rust does not have garbage collection, or any other implementation for memory management, you often need to
 use ref counting and `std::rc::Rc` types to make sure everything happens under control.
...

Would you mind trying to add <|begin_of_text|> at the beggining of your prompt and see if the result gets better? If it doesn't help I'll dig more into aligning the logits between the rust and pytorch versions.

jorgeantonio21 commented 7 months ago

@LaurentMazare, trying as you say with <|begin_of_text|>, I got an even worse result:

<|begin_of_text|> The best thing about coding in rust is the ability to create websites with SIMD for optimized 
fast speed. 

## Rust website
This a basic bootstrap template that uses rust.
The examples are written in rust, and use include! macro.

## Rust Code 

    // This is an example of using the include! macro (Fastest way for ruby developers)
        include!("example.Rust.html");

# How to set-up environment

Step 1:
Add this line in your Gemfile : ```gem 'sass-rails', '~> 5.0.6' ```

Step 2: Setup sass locally
Example:
**src/assets/sass/main.scss**
// src/assets/sass/style.scss

/* === MAIN STYLES =========================================================== */
body {
  font-family: sans-serif;
}

then run command ```sass --watch src/assets/sass/main.scss dst/styles.css ```

Step 3:
Setup rust and build it ( use cargo )

this should create some executable named example.Rust.html
(Code above will be compiled and create a static html file)

How to include your Example 

Copy this line in **_layout.erb**

<!DOCTYPE HTML>
<html lang="en">
<head>
        <meta charset="utf-8">
        <!-- Theme Name: Bootstrap -->
    <title><%= @page_title %></title>

   <%= stylesheet_link_tag "style" %>
</head>
<body>
<%= yield %>
</body>
</html> 

and the following code in the page you want. 

**You can change displayname and name of the function in include! macro with the example below**

# Create an array of parameters from an existing map (optional)
    <% params = { :example2 => 'example 1'}%>

    <!-- Call a function in a C library -->
    <% include!(params, "src/external/example.cpp", "function_example") %> 

__Reference: https://github.com/benbuckman/rust-webserver.git__

View:
https://raw.githubusercontent.com/benbuckman/rust-webserver/master/doc/examples.md

Additionally, I am having this warning as well:

2024-04-19T20:59:56.236300Z  WARN tokenizers::tokenizer::serialization: Warning: Token '<|begin_of_text|>' was expected to have ID '128000' but was given ID 'None'    
2024-04-19T20:59:56.236335Z  WARN tokenizers::tokenizer::serialization: Warning: Token '<|end_of_text|>' was expected to have ID '128001' but was given ID 'None'    
2024-04-19T20:59:56.236337Z  WARN tokenizers::tokenizer::serialization: Warning: Token '<|reserved_special_token_0|>' was expected to have ID '128002' but was given ID 'None'    
2024-04-19T20:59:56.236339Z  WARN tokenizers::tokenizer::serialization: Warning: Token '<|reserved_special_token_1|>' was expected to have ID '128003' but was given ID 'None'    
2024-04-19T20:59:56.236340Z  WARN tokenizers::tokenizer::serialization: Warning: Token '<|reserved_special_token_2|>' was expected to have ID '128004' but was given ID 'None'    
2024-04-19T20:59:56.237660Z  WARN tokenizers::tokenizer::serialization: Warning: Token '<|reserved_special_token_3|>' was expected to have ID '128005' but was given ID 'None'    
2024-04-19T20:59:56.237663Z  WARN tokenizers::tokenizer::serialization: Warning: Token '<|start_header_id|>' was expected to have ID '128006' but was given ID 'None'    
2024-04-19T20:59:56.237664Z  WARN tokenizers::tokenizer::serialization: Warning: Token '<|end_header_id|>' was expected to have ID '128007' but was given ID 'None'    
2024-04-19T20:59:56.237665Z  WARN tokenizers::tokenizer::serialization: Warning: Token '<|reserved_special_token_4|>' was expected to have ID '128008' but was given ID 'None'   
...

Which might explain why I am having such bad results.

LaurentMazare commented 7 months ago

Interesting, could you provide the command that you run so that I can try to replicate things? Did you try running the command that I sent? That didn't make any warning and seemed to be working properly.