[Feature request] Add Support for vicuna-13b-delta-v1.1

DK013 commented 1 year ago

*support for vicuna-13b-delta-v1.1 NOTE: It's not listed in transformers supported models list but it does work with transformers*

Reason for request With the upcoming WebGPU support in ONNXRuntime I believe it'll be really helpful to have an LLm support for browser based applications and this repo is the best solution we got so far.

Additional context I've been working on an AI assistant made in electron and cordova for desktop and mobile platforms respectively. I'm already using TransformerJS with whisper for speech-to-text. I intend to switch to WebGPU with JSEP as soon it's available, so I can leverage the GPU compute capabilities to run larger models. I'm trying to build the project with as much opensource resources as possible and having an LLM support would be real nice instead of using openai apis. This keeps the project cost free for users and user's data-privacy is another benifit. I'm really looking forward to see if this is gonna be possible. I'm willing to contribute as much as I can being a complete novice to the ML community.

Thanks in advance

xenova commented 1 year ago

I believe it'll be really helpful to have an LLm support for browser based applications and this repo is the best solution we got so far.

Agreed! WebGPU for onnxruntime-web is almost here (see https://github.com/microsoft/onnxruntime/pull/14579), and Transformers.js will support it when ready! There will be a massive announcement when it does drop! As for now, it is just a matter of waiting 😅 ...

DK013 commented 1 year ago

Agreed! WebGPU for onnxruntime-web is almost here (see microsoft/onnxruntime#14579), and Transformers.js will support it when ready! There will be a massive announcement when it does drop! As for now, it is just a matter of waiting 😅 ...

It's here !! They just merged the [js/web] WebGPU backend via JSEP #14579 few hours ago into the main brunch. No official release yet. Looks like @fs-eire openned another pull request for code cleanup and some small fixes. But we can build from the main brunch and start coding 😄

fs-eire commented 1 year ago

it takes some time and effort from enabling building from source, to including in the NPM package, to release as experimental feature, and then to final release.

I will keep working on stability, performance and coverage of the webgpu backend operator implementation in ort-web. this is going to be long-term work. Please feel free to "@" me or submit github issues to onnxruntime for feedback.

xenova commented 1 year ago

@fs-eire I have been following the build instructions from here, and I think I've sorted out most issues regarding dependencies, versions, etc.

However, when running in the browser, I get the error JS execution provider is not supported in this build. Understandably, the docs have not yet properly been updated, so, would it be possible for you to provide steps to build from source, including what build arguments I should use? Thanks!

DK013 commented 1 year ago

I've been looking at the updated files and under <ORT_ROOT>/tools/ci_build/build.py I can see there's arguments for use_jsep but it works within build_wasm from what I can tell checking the yml configs for CI. So I guess the build command will look something like this: ./build.bat --build_wasm --enable_wasm_simd --use_jsep --target onnxruntime_webassembly. However, if you've cloned the repo as instructed here, chances are you don't have the latest source and the --use_jsep argument will fail. Simply download zip from main brunch and replace the files with latest version in local. @xenova Give it a try

[UPDATE]: Here's what worked for me... Build 4 times with these args:

./build.bat --config Release --build_wasm --use_jsep --skip_tests --disable_wasm_exception_catching --disable_rtti
./build.bat --config Release --build_wasm --use_jsep --skip_tests --disable_wasm_exception_catching --disable_rtti --enable_wasm_threads
./build.bat --config Release --build_wasm --use_jsep --skip_tests --disable_wasm_exception_catching --disable_rtti --enable_wasm_simd
./build.bat --config Release --build_wasm --use_jsep --skip_tests --disable_wasm_exception_catching --disable_rtti --enable_wasm_threads --enable_wasm_simd

Then copy files as instructed in the documentation and build npm package.

I'm attaching my successful npm package build zip here just in case if you wanna be lazy. 😘 ort.zip

xenova commented 1 year ago

However, if you've cloned the repo as instructed here, chances are you don't have the latest source and the --use_jsep argument will fail. Simply download zip from main brunch and replace the files with latest version in local.

That was it! Commit history says that change was made yesterday, so I was one day out of date haha.

The package is being built now, and I will hopefully have more updates later today 🎉

I'm attaching my successful npm package build zip here just in case if you wanna be lazy. 😘 ort.zip

This will come in handy! Thanks!

xenova commented 1 year ago

So, I did that all, but it doesn't seem to run :/ Here's the error I get: https://github.com/microsoft/onnxruntime/issues/15719

@DK013 If you're able to, could you try run the model linked here with the following input:

let input = {
    attention_mask: new Tensor(
        'int64',
        new BigInt64Array([1n, 1n, 1n, 1n, 1n, 1n, 1n, 1n, 1n, 1n, 1n, 1n]),
        [1, 12]
    ),
    input_ids: new Tensor(
        'int64',
        new BigInt64Array([13959n, 1566n, 12n, 2379n, 10n, 8774n, 6n, 149n, 33n, 25n, 58n, 1n]),
        [1, 12]
    )
}

or see here for a full demo to test with.

fs-eire commented 1 year ago

@DK013 please use the workaround as described in https://github.com/microsoft/onnxruntime/issues/15719#issuecomment-1526216605 . I am working on a solution to fix the issue.

DK013 commented 1 year ago

@xenova As I can see fs-eire is working on the issues we've encountered before, and I'm running behind schedule on my own project, I'm implementing transformers.js with cpu for now in my code. Mainly I need whisper (a little more than the base model hopefully) to work right now and a suitable LLM model later on. So I'm gonna go ahead and complete the basic codes for testing right now, and wait for once you guys are done with polishing webgpu. If you need any input from me, just let me know. 😉

fs-eire commented 1 year ago

@DK013 The PR mentioned above is merged, and another bugfix is in PR: https://github.com/microsoft/onnxruntime/pull/15819.

matthoffner commented 1 year ago

Catching up on this issue, does this mean there is conversational model support with the onnxruntime PRs? The README shows it as not supported yet. Thanks for any clarification!

xenova commented 1 year ago

does this mean there is conversational model support with the onnxruntime PRs?

The vicuna-13b-delta-v1.1 is categorized as a text-generation model (not a conversational model), which is supported by Transformers.js. The distinction (which mainly lies in how they are used) is subtle, as both can be used for "conversations". For more information, see:

DK013 commented 1 year ago

@xenova heyo, I've been a bit busy with my own projects and running the business and all. what's the status of webgpu? How are your tests going?

xenova / transformers.js

[Feature request] Add Support for vicuna-13b-delta-v1.1 #96