huggingface / candle

Minimalist ML framework for Rust
Apache License 2.0
15.9k stars 963 forks source link

Model Wishlist #1177

Open LaurentMazare opened 1 year ago

LaurentMazare commented 1 year ago

This issue aims at keeping track of the models that would be interesting to get added to candle. Feel free to make a comment to mention a new model, or vote for a model already in the list.

Added recently:

jorgeantonio21 commented 7 months ago

This is running through another internal library, the repo is not yet publicly available, but I can share more details. That said, the Llama integration logic basically follows the candle example. This is the code for fetching and loading llama models

   fn fetch(
        api_key: String,
        cache_dir: PathBuf,
        config: ModelConfig,
    ) -> Result<Self::LoadData, ModelError> {
        let device = device(config.device_id())?;
        let dtype = DType::from_str(&config.dtype())?;

        let api = ApiBuilder::new()
            .with_progress(true)
            .with_token(Some(api_key))
            .with_cache_dir(cache_dir)
            .build()?;

        let model_type = ModelType::from_str(&config.model_id())?;
        let repo_id = model_type.repo().to_string();
        let revision = model_type.default_revision().to_string();

        let repo = api.repo(Repo::with_revision(
            repo_id.clone(),
            RepoType::Model,
            revision,
        ));
        let config_file_path = repo.get("config.json")?;
        let tokenizer_file_path = repo.get("tokenizer.json")?;

        let model_weights_file_paths = if &repo_id == "TinyLlama/TinyLlama-1.1B-Chat-v1.0" {
            vec![repo.get("model.safetensors")?]
        } else {
            hub_load_safetensors(&repo, "model.safetensors.index.json")?
        };

        let mut file_paths = Vec::with_capacity(2 + model_weights_file_paths.len());
        file_paths.extend(vec![config_file_path, tokenizer_file_path]);
        file_paths.extend(model_weights_file_paths);

        Ok(Self::LoadData {
            device,
            dtype,
            file_paths,
            model_type: ModelType::from_str(&config.model_id())?,
            use_flash_attention: config.use_flash_attention(),
        })
    }

    fn load(load_data: Self::LoadData) -> Result<Self, ModelError> {
        info!("Loading Llama model ...");

        let start = Instant::now();

        let device = load_data.device;
        let dtype = load_data.dtype;
        let (model, tokenizer_filename, config) = {
            let config_filename = load_data.file_paths[0].clone();
            let config: LlamaConfig = serde_json::from_slice(&std::fs::read(config_filename)?)?;

            let tokenizer_filename = load_data.file_paths[1].clone();
            let config = config.into_config(load_data.use_flash_attention);

            let vb = unsafe {
                VarBuilder::from_mmaped_safetensors(&load_data.file_paths[2..], dtype, &device)?
            };
            (model::Llama::load(vb, &config)?, tokenizer_filename, config)
        };
        let tokenizer = Tokenizer::from_file(tokenizer_filename)?;
        info!("Loaded Llama model in {:?}", start.elapsed());

        Ok(Self {
            device,
            model,
            model_type: load_data.model_type,
            tokenizer,
            config,
            dtype,
        })
    }

The load of tokenizer seems to be resulting in the above warnings..

EricLBuehler commented 7 months ago

I can reproduce the warning. It seems like the tokens are being added, I'm not sure why the warning is being raised. It comes from here: https://github.com/huggingface/tokenizers/blob/71c2a8d01a56cd7bd28148c309e210c47dac78e7/tokenizers/src/tokenizer/serialization.rs#L58

jorgeantonio21 commented 7 months ago

After inspection, I do see the tokens being generated correctly (following the tokenizer file specifications). So that doesn't explain the bad results produced by the model, shown above.

LaurentMazare commented 7 months ago

Could you try reproducing the issue with the command line example (see the sample I gave)? That would make it much easier to inspect what is going on.

jorgeantonio21 commented 7 months ago

It seems that I was able to make it respond properly. Using the command line example you provided, I got something reasonable like:

starting the inference loop
<|begin_of_text|> The best thing about coding in rust is  the ability to create mini languages .

The Rust type system has a built-in macro feature. This is especially helpful when you want to define a custom language or domain-specific function to help your program perform certain tasks.
This article explores how to use this feature of the rust programming language to build an application that generates random numbers within defined boundaries.

Let's start by creating a new project:

rustc generate.rs && cargo run

 In this project, we'll define two functions `generate()` and `random`.

The `generate()` function accepts no arguments and calls the `random` function twice within itself. It then returns only one of these values using a simple boolean expression.
The first call to `random()` generates a number between 1 and 100 inclusive, while the second call generates a number between 20 and 80 inclusive. If both are true, it returns True; otherwise, it returns False:

fn main() {
    let mut rng = rand::thread_rng();

    println!("{:?}", generate());
}

// Define custom language
macro_rules! random {
    () => ({
        // Generate number between 1 - 100 inclusive
        let num1: i64 = rng.gen_range(1..101);

        // Generate number between 20 - 80 inclusive
        let num2: i64 = rng.gen_range(21..81);

        true if &num1 < &num2 else false
    });
}

You can test this example on [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=52f5c0e7459bdc5cb0dbad3a7fd35fc4).
LaurentMazare commented 7 months ago

Ok, let us know if you manage to replicate the issue here (or if you have some standalone repro that could be used for debugging, ideally as small as possible). Also just to point out that llama-v3 70b is now supported as part of the llama_multiprocess example, though it requires enough GPUs to hold all the weights (so a total GPU memory of ~160GB).

oovm commented 7 months ago

Can I request enhancements and extensions to existing models?

I hope candle-diffusion's pipeline can support LoRA and ControlNet.

EricLBuehler commented 7 months ago

@oovm, candle-lora provides support for training LoRA.

vody-am commented 6 months ago

I'd like to leave a comment that PaliGemma would be really fun to have. It's a pretty capable multimodal for the size.

I've been working on porting it to candle. I believe that doing so requires:

1) porting the siglip vision encoder (which I think I mostly have done; it's similar to CLIP's with some small changes) 2) modifying Gemma to accept input embeddings

LaurentMazare commented 6 months ago

I've been working on porting it to candle. I believe that doing so requires:

  1. porting the siglip vision encoder (which I think I mostly have done; it's similar to CLIP's with some small changes)
  2. modifying Gemma to accept input embeddings

    • correct me if I'm wrong here, but I believe this will be needed, similar to mixformer.rs.
    • I am following the Transformers implementation at: modeling_gemma.py

I haven't looked at the details of palitgemma but what these two steps sound reasonable. Looking forward to having it supported, ideally you could run both the transformers python implementation and the candle one on the same input and check that the results are reasonably in line.

vody-am commented 6 months ago

@LaurentMazare no rush on this, but if you can comment, I ran into a small roadbump that I'm not sure how to get around. https://github.com/huggingface/candle/discussions/2208 .

TLDR: Pytorch side uses masked_scatter, not sure how to express this with Candle operations.

kczimm commented 5 months ago

Is there support for Alibaba-NLP/gte-base-en-v1.5? If not, any tips on how to start an implementation on this?

vody-am commented 5 months ago

@kczimm with the caveat that I have not yet personally succeeded, in order to port models to candle, more or less you need to open up the Pytorch version, and just follow along. The candle api has a lot of overlap with Pytorch, so a lot of method calls just get mapped over to the Rust side (with small differences). In order to verify correctness, you just run both, and capture the outputs. There is a tutorial for Roberta https://github.com/ToluClassics/candle-tutorial/ and generally you can follow https://github.com/huggingface/candle/tree/main/candle-transformers/src/models internals.

The one concrete tip I can give, is that if you print the Pytorch model, often times you'll see the "tree" which represents the model. This gives you a good idea of what the Rust structure looks like, and you can copy it along with copying Python variable names with VarBuilder to load the weights.

e.g. when working on porting a model, I followed the SafeTensors index https://huggingface.co/google/paligemma-3b-pt-224/blob/main/model.safetensors.index.json to see if I was correctly loading the weights.

For your model specifically, it looks like they supply ONNX files? So perhaps https://github.com/huggingface/candle/tree/main/candle-onnx is relevant for you! with an example at https://github.com/huggingface/candle/tree/main/candle-examples/examples/onnx

kczimm commented 5 months ago

@vody-am thanks for the help! As I look at this model, I see that the model_type is "new" and the architecture is "NewModel". When I print the model in Python, I get:

NewModel(
  (embeddings): NewEmbeddings(
    (word_embeddings): Embedding(30528, 768, padding_idx=0)
    (rotary_emb): NTKScalingRotaryEmbedding()
    (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (encoder): NewEncoder(
    (layer): ModuleList(
      (0-11): 12 x NewLayer(
        (attention): NewAttention(
          (qkv_proj): Linear(in_features=768, out_features=2304, bias=True)
          (dropout): Dropout(p=0.0, inplace=False)
          (o_proj): Linear(in_features=768, out_features=768, bias=True)
        )
        (mlp): NewGatedMLP(
          (up_gate_proj): Linear(in_features=768, out_features=6144, bias=False)
          (down_proj): Linear(in_features=3072, out_features=768, bias=True)
          (act_fn): GELUActivation()
          (hidden_dropout): Dropout(p=0.1, inplace=False)
        )
        (attn_ln): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
        (mlp_ln): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
        (hidden_dropout): Dropout(p=0.1, inplace=False)
      )
    )
  )
)

but searching transformers repo for NewModel, NewEmbeddings, NewEncoder, etc. to find the PyTorch implementation doesn't yield anything interesting. Do you have advice on finding the PyTorch implementation? Do you know what this "New" business is? Thanks!

vody-am commented 5 months ago

@kczimm https://huggingface.co/Alibaba-NLP/new-impl/blob/main/modeling.py#L276 looks like it.

Going back to Candle, I think the "out of the box" parts that are reusable for you are:

Linear, Dropout, LayerNorm, etc.

You could start by following a structure like this:

use candle_nn as nn;

struct NewModel {
   embeddings: NewEmbeddings,
   encoder: NewEncoder,
}

struct NewEmbeddings {
   word_embeddings: nn::Embedding,
   ...
}

struct NewEncoder {
   layer: Vec<NewLayer>,
}

struct NewLayer {
  attention: ...,
  mlp: ...,
  attn_ln: ...,

(a LOT of steps are omitted).

Hope that helps!

julien-blanchon commented 4 months ago

Moshi 👀

LaurentMazare commented 4 months ago

Moshi 👀

Certainly planned :) Meanwhile you can try the online version here and here for the us servers, it's all built with rust & candle. The soon to be open-source version runs well on a macbook pro, all powered by candle behind the hood! Also the keynote talk if anyone wants more context !

EricLBuehler commented 4 months ago

Congratulations @LaurentMazare!

donkey-donkey commented 4 months ago

sure would be awesome to use controlnet with candle?

BradyBonnette commented 3 months ago

Going back to this: https://github.com/huggingface/candle/issues/1177#issuecomment-1810388950

I currently have a use case for attempting something with candle and Deberta. I took a very quick (and not too deep) study of the candle codebase, and I am not entirely sure it'd work out of the box for Deberta.

I even tried a quick use of the example:

$ cargo run --example bert --release -- --model-id microsoft/deberta-v3-large  --revision main
<compilation output omitted>
Running on CPU, to run on GPU, build this example with `--features cuda`
Error: request error: https://huggingface.co/microsoft/deberta-v3-large/resolve/main/tokenizer.json: status code 404

Caused by:
    https://huggingface.co/microsoft/deberta-v3-large/resolve/main/tokenizer.json: status code 404

Which in and of itself probably isn't a huge deal, considering the repo doesn't have a tokenizer.json file. Deberta does use the SentencePiece tokenizer, and there is an spm.model file in that repo.

However, when I try to load the configuration config.json for the model, I get this:

Could not parse config json using serde: Error("missing field `pad_token_id`", line: 22, column: 1)

The config.json that was uploaded for Deberta v3 (at least for the large version) doesn't contain a pad_token_id. It appears that most of the Bert Config struct does have what's expected for Deberta v3, with a few notable exceptions:

Is it possible that everything that's built in candle currently will only work with older versions of Deberta, and that v3 is something that'd have to be added?

super-fun-surf commented 3 months ago

For super high resolutions it would be great to have this stable cascade variant called Ultra Pixel https://github.com/catcathh/UltraPixel

super-fun-surf commented 3 months ago

For diffusion image generation it would be really nice to have Self Attention Guidance and Perturbed Attention Guidance. That code exists in python.. is there a standard way to translate them over to candle?

sidharthrajaram commented 3 months ago

Support for Parler TTS (a HF project) would be great. Following up on the discussion on #2283 and https://github.com/huggingface/parler-tts/issues/77 .

timwalls commented 2 months ago

I'm having trouble (as in, it doesn't work ;-)) with the jinaai/jina-embeddings-v2-base-code model:

[2024-09-02T11:48:45Z ERROR olimpia::ai::embedding] Error creating BertModel: WithBacktrace { inner: CannotFindTensor { path: "encoder.layer.0.mlp.gated_layers.weight" }, backtrace: Backtrace [{ fn: "candle_core::error::Error::bt", file: "/home/timwa/.cargo/registry/src/index.crates.io-6f17d22bba15001f/candle-core-0.6.0/src/error.rs", line: 231 }, { fn: "candle_core::safetensors::MmapedSafetensors::get::{{closure}}", file: "/home/timwa/.cargo/registry/src/index.crates.io-6f17d22bba15001f/candle-core-0.6.0/src/safetensors.rs", line: 340 }, { fn: "core::option::Option<T>::ok_or_else", file: "/rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23/library/core/src/option.rs", line: 1239 }, { fn: "candle_core::safetensors::MmapedSafetensors::get", file: "/home/timwa/.cargo/registry/src/index.crates.io-6f17d22bba15001f/candle-core-0.6.0/src/safetensors.rs", line: 339 }, { fn: "candle_core::safetensors::MmapedSafetensors::load", file: "/home/timwa/.cargo/registry/src/index.crates.io-6f17d22bba15001f/candle-core-0.6.0/src/safetensors.rs", line: 324 }, { fn: "<candle_core::safetensors::MmapedSafetensors as candle_nn::var_builder::SimpleBackend>::get", file: "/home/timwa/.cargo/registry/src/index.crates.io-6f17d22bba15001f/candle-nn-0.6.0/src/var_builder.rs", line: 382 }, { fn: "<alloc::boxed::Box<dyn candle_nn::var_builder::SimpleBackend> as candle_nn::var_builder::Backend>::get", file: "/home/timwa/.cargo/registry/src/index.crates.io-6f17d22bba15001f/candle-nn-0.6.0/src/var_builder.rs", line: 86 }, { fn: "candle_nn::var_builder::VarBuilderArgs<B>::get_with_hints_dtype", file: "/home/timwa/.cargo/registry/src/index.crates.io-6f17d22bba15001f/candle-nn-0.6.0/src/var_builder.rs", line: 198 }, { fn: "candle_nn::var_builder::VarBuilderArgs<B>::get_with_hints", file: "/home/timwa/.cargo/registry/src/index.crates.io-6f17d22bba15001f/candle-nn-0.6.0/src/var_builder.rs", line: 181 }, { fn: "candle_nn::linear::linear_no_bias", file: "/home/timwa/.cargo/registry/src/index.crates.io-6f17d22bba15001f/candle-nn-0.6.0/src/linear.rs", line: 75 }, { fn: "candle_transformers::models::with_tracing::linear_no_bias", file: "/home/timwa/.cargo/registry/src/index.crates.io-6f17d22bba15001f/candle-transformers-0.6.0/src/models/with_tracing.rs", line: 63 }, { fn: "candle_transformers::models::jina_bert::BertGLUMLP::new", file: "/home/timwa/.cargo/registry/src/index.crates.io-6f17d22bba15001f/candle-transformers-0.6.0/src/models/jina_bert.rs", line: 218 }, { fn: "candle_transformers::models::jina_bert::BertLayer::new", file: "/home/timwa/.cargo/registry/src/index.crates.io-6f17d22bba15001f/candle-transformers-0.6.0/src/models/jina_bert.rs", line: 257 }, { fn: "candle_transformers::models::jina_bert::BertEncoder::new::{{closure}}", file: "/home/timwa/.cargo/registry/src/index.crates.io-6f17d22bba15001f/candle-transformers-0.6.0/src/models/jina_bert.rs", line: 316 }, { fn: "core::iter::adapters::map::map_try_fold::{{closure}}", file: "/rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23/library/core/src/iter/adapters/map.rs", line: 96 }, { fn: "core::iter::traits::iterator::Iterator::try_fold", file: "/rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23/library/core/src/iter/traits/iterator.rs", line: 2411 }, { fn: "<core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::try_fold", file: "/rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23/library/core/src/iter/adapters/map.rs", line: 122 }, { fn: "<core::iter::adapters::GenericShunt<I,R> as core::iter::traits::iterator::Iterator>::try_fold", file: "/rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23/library/core/src/iter/adapters/mod.rs", line: 204 }, { fn: "core::iter::traits::iterator::Iterator::try_for_each", file: "/rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23/library/core/src/iter/traits/iterator.rs", line: 2473 }, { fn: "<core::iter::adapters::GenericShunt<I,R> as core::iter::traits::iterator::Iterator>::next", file: "/rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23/library/core/src/iter/adapters/mod.rs", line: 187 }, { fn: "<alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter", file: "/rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23/library/alloc/src/vec/spec_from_iter_nested.rs", line: 26 }, { fn: "<alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter", file: "/rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23/library/alloc/src/vec/spec_from_iter.rs", line: 33 }, { fn: "<alloc::vec::Vec<T> as core::iter::traits::collect::FromIterator<T>>::from_iter", file: "/rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23/library/alloc/src/vec/mod.rs", line: 2970 }, { fn: "core::iter::traits::iterator::Iterator::collect", file: "/rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23/library/core/src/iter/traits/iterator.rs", line: 2005 }, { fn: "<core::result::Result<V,E> as core::iter::traits::collect::FromIterator<core::result::Result<A,E>>>::from_iter::{{closure}}", file: "/rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23/library/core/src/result.rs", line: 1960 }, { fn: "core::iter::adapters::try_process", file: "/rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23/library/core/src/iter/adapters/mod.rs", line: 173 }, { fn: "<core::result::Result<V,E> as core::iter::traits::collect::FromIterator<core::result::Result<A,E>>>::from_iter", file: "/rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23/library/core/src/result.rs", line: 1960 }, { fn: "core::iter::traits::iterator::Iterator::collect", file: "/rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23/library/core/src/iter/traits/iterator.rs", line: 2005 }, { fn: "candle_transformers::models::jina_bert::BertEncoder::new", file: "/home/timwa/.cargo/registry/src/index.crates.io-6f17d22bba15001f/candle-transformers-0.6.0/src/models/jina_bert.rs", line: 315 }, { fn: "candle_transformers::models::jina_bert::BertModel::new", file: "/home/timwa/.cargo/registry/src/index.crates.io-6f17d22bba15001f/candle-transformers-0.6.0/src/models/jina_bert.rs", line: 352 }, { fn: "olimpia::ai::embedding::EmbeddingModel::new::{{closure}}", file: "./src/ai/embedding.rs", line: 76 }, { fn: "embed_gitlab::main::{{closure}}", file: "./src/bin/gitlabembed/main.rs", line: 140 }, { fn: "tokio::runtime::park::CachedParkThread::block_on::{{closure}}", file: "/home/timwa/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.38.1/src/runtime/park.rs", line: 281 }, { fn: "tokio::runtime::coop::with_budget", file: "/home/timwa/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.38.1/src/runtime/coop.rs", line: 107 }, { fn: "tokio::runtime::coop::budget", file: "/home/timwa/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.38.1/src/runtime/coop.rs", line: 73 }, { fn: "tokio::runtime::park::CachedParkThread::block_on", file: "/home/timwa/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.38.1/src/runtime/park.rs", line: 281 }, { fn: "tokio::runtime::context::blocking::BlockingRegionGuard::block_on", file: "/home/timwa/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.38.1/src/runtime/context/blocking.rs", line: 66 }, { fn: "tokio::runtime::scheduler::multi_thread::MultiThread::block_on::{{closure}}", file: "/home/timwa/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.38.1/src/runtime/scheduler/multi_thread/mod.rs", line: 87 }, { fn: "tokio::runtime::context::runtime::enter_runtime", file: "/home/timwa/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.38.1/src/runtime/context/runtime.rs", line: 65 }, { fn: "tokio::runtime::scheduler::multi_thread::MultiThread::block_on", file: "/home/timwa/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.38.1/src/runtime/scheduler/multi_thread/mod.rs", line: 86 }, { fn: "tokio::runtime::runtime::Runtime::block_on", file: "/home/timwa/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.38.1/src/runtime/runtime.rs", line: 349 }, { fn: "embed_gitlab::main", file: "./src/bin/gitlabembed/main.rs", line: 203 }, { fn: "core::ops::function::FnOnce::call_once", file: "/rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23/library/core/src/ops/function.rs", line: 250 }, { fn: "std::sys_common::backtrace::__rust_begin_short_backtrace", file: "/rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23/library/std/src/sys_common/backtrace.rs", line: 155 }, { fn: "std::rt::lang_start::{{closure}}", file: "/rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23/library/std/src/rt.rs", line: 159 }, { fn: "core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once", file: "/rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23/library/core/src/ops/function.rs", line: 284 }, { fn: "std::panicking::try::do_call", file: "/rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23/library/std/src/panicking.rs", line: 559 }, { fn: "std::panicking::try", file: "/rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23/library/std/src/panicking.rs", line: 523 }, { fn: "std::panic::catch_unwind", file: "/rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23/library/std/src/panic.rs", line: 149 }, { fn: "std::rt::lang_start_internal::{{closure}}", file: "/rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23/library/std/src/rt.rs", line: 141 }, { fn: "std::panicking::try::do_call", file: "/rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23/library/std/src/panicking.rs", line: 559 }, { fn: "std::panicking::try", file: "/rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23/library/std/src/panicking.rs", line: 523 }, { fn: "std::panic::catch_unwind", file: "/rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23/library/std/src/panic.rs", line: 149 }, { fn: "std::rt::lang_start_internal", file: "/rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23/library/std/src/rt.rs", line: 141 }, { fn: "std::rt::lang_start", file: "/rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23/library/std/src/rt.rs", line: 158 }, { fn: "main" }, { fn: "__libc_start_call_main", file: "./csu/../sysdeps/nptl/libc_start_call_main.h", line: 58 }, { fn: "__libc_start_main_impl", file: "./csu/../csu/libc-start.c", line: 392 }, { fn: "_start" }] }

My assumption is that whatever was done here (https://github.com/huggingface/text-embeddings-inference/issues/306) also needs to be ported to Candle somehow; before I try and wrap my head around what that is exactly to see if I can work out how to do it myself, can anyone confirm if this is just operator error on my part and it in fact ought to work anyway, or alternatively if I am about to embark on a great folly and should wait a bit?

okpatil4u commented 2 months ago

When is moshi being ported to candle ?

julien-blanchon commented 2 months ago

@okpatil4u It's already: https://github.com/kyutai-labs/moshi/tree/main/rust

louis030195 commented 2 months ago

pixtral from the frenchs!

https://huggingface.co/mistralai/Pixtral-12B-2409

LaurentMazare commented 1 month ago

I'd like to leave a comment that PaliGemma would be really fun to have. It's a pretty capable multimodal for the size.

This has been quite some time but paligemma should now be available with an example of how to use it here.

LaurentMazare commented 1 month ago

pixtral from the frenchs!

Pixtral should be available now as part of #2521 (there is a bit of polishing to be done though).

zachcp commented 1 month ago

I'd like to get Protein-MPNN/LigandMPNN on Candle. gh; paper. The weights are only about 6.4 MB.

zachcp commented 1 month ago

I'd also like to get https://github.com/chandar-lab/AMPLIFY ported to Candle. Its reasonably sized relative to performance; open model; weights are already in safetensor format. I can port the example code to work with a CANDLE/WASM implementation.

louis030195 commented 1 month ago

would love any modern diarization model or example

super-fun-surf commented 1 month ago

Loras for stable diffusion and now FLUX. looks like diffusers has support for new flux loras https://github.com/huggingface/diffusers/pull/9295

super-fun-surf commented 1 month ago

with memory being such a bottleneck it would me amazing to have a tiled vae decoder for SD and FLUX. I think this is a popular one in python https://github.com/shiimizu/ComfyUI-TiledDiffusion

super-fun-surf commented 1 month ago

SD 3.5 Large. It looks similar to 3 Medium but they pulled the clip files out of the t5...?

vrdn-23 commented 1 month ago

@BradyBonnette Did you ever happen to figure out how to get deberta-v2 working? Is there any plan to officially support a deberta-v2 model in candle? cc @LaurentMazare

BradyBonnette commented 1 month ago

@vrdn-23

Funny enough I have a long running branch in a fork of candle that I've been meaning to finish up that is a port of deberta-v2/v3 that works and has an example app like other models. Out-of-the-box in candle, deberta did not work like other BERT models, so I just ported it from the Python HuggingFace Transformers version. There's a lot of stuff involved, especially with the disentangled attention stuff.

If there is any interest in bringing this into candle proper via a pull request let me know, and I can clean it up and get it ready. I originally had an immediate use for it for some things I was working on, but that fell to the wayside. But that being said I don't mind getting it across the finish line if there are others interested in it.

vrdn-23 commented 1 month ago

If there is any interest in bringing this into candle proper via a pull request let me know, and I can clean it up and get it ready.

I think that would be a tremendous help tbh @BradyBonnette. There seems to be a lot of interest for the model to be supported in the TEI repo (see https://github.com/huggingface/text-embeddings-inference/issues/354) and I think getting it supported in candle would probably be the first step to making it available. Thanks again for the quick response!

BradyBonnette commented 1 month ago

@vrdn-23 If you're curious about what I've done so far, check out https://github.com/huggingface/candle/compare/main...BradyBonnette:candle:debertaV2 (it's a LOT of work-in-progress commits that'd get rebased/squished if/when this gets into a pull request)

If you want to try it out, feel free to clone it and test it out.

You can run it with something like: cargo run --example debertav2 --features=cuda,cudnn -- --model-id=blaze999/Medical-NER --revision=main --sentence='63 year old woman with history of CAD presented to ER' (using https://huggingface.co/blaze999/Medical-NER as an example)

You can also use locally fine-tuned deberta-based models.

I don't have any documentation in there yet (which would be part of my cleanup mode if there was interest), but also the only thing it can do currently is NER. I do have some uncommitted work that should work for text classification.

Note that I have two "deberta v2" branches. Youll want the one I linked above. The other one was almost succumbed to a "technical issue" due to failing hardware on my end.

super-fun-surf commented 1 week ago

AuraFlow is a full open source (apache) image diffusion model based on SD3.

https://blog.fal.ai/auraflow/ https://huggingface.co/docs/diffusers/main/en/api/pipelines/aura_flow https://huggingface.co/fal/AuraFlow-v0.3/tree/main

looks like they are using only T5. (not clip l an g) this variant EleutherAI/pile-t5-xl