ml-explore / mlx-examples

Examples in the MLX framework
MIT License
6.13k stars 870 forks source link

Is `mlx>=0.0.11` pushed to `pip` yet? #368

Closed ibehnam closed 9 months ago

ibehnam commented 9 months ago

When trying the gguf example, I get:

ERROR: Could not find a version that satisfies the requirement mlx>=0.0.11 (from versions: 0.0.4.dev20231210, 0.0.4, 0.0.5.dev20231217, 0.0.5, 0.0.6.dev20231224, 0.0.6.dev20231231, 0.0.6, 0.0.7.dev202417, 0.0.7, 0.0.9.dev2024114, 0.0.9, 0.0.10.dev20240121, 0.0.10)
ERROR: No matching distribution found for mlx>=0.0.11
awni commented 9 months ago

Yes that's intentional. You have to build mlx from source to use that example until we release 0.0.11 (probably 1-2 days if you can wait 😄 )

ibehnam commented 9 months ago

@awni Okay, I built it from the source now and noticed that the speed (compared to llama.cpp) is less:

Prompt: 9.820 tokens-per-sec

Generation: 7.829 tokens-per-sec

Prompt: 14.28 tokens-per-sec

Generation: 19.70 tokens-per-sec

(M1 Pro chip, Sonoma)

awni commented 9 months ago

Thanks for the benchmark.

Some comments:

ibehnam commented 9 months ago

Thanks @awni for clarifying. I like MLX and I hope the improvements you mentioned make it more attractive for devs. Maybe if MLX had something like llama.cpp/server, the warmup time wouldn't matter for the users. llama.cpp is trying to implement Flash Attention so it would get even faster, but I think MLX can make its own improvements esp. as a general ML framework for Apple silicon.

mzbac commented 9 months ago

@awni Not sure if mlx-lm wants to integrate with server functionality, but I feel it can be useful for people who want a quick taste of mlx. I have an example on how to run an OpenAI-like API using mlx-lm. The implementation is straightforward. Maybe we can add some of those community examples in the readme so that people can try them out without having to download and build the mlx-example themselves.

awni commented 9 months ago

Not sure if mlx-lm wants to integrate with server functionality, but I feel it can be useful for people who want a quick taste of mlx. I have an example on how to run an OpenAI-like API using mlx-lm

That's super cool! I'm not opposed to including it in mlx-lm. It could be a convenient way to show how to load a model persistently. What do you think, does it make sense in mlx-lm?

Maybe we can add some of those community examples in the readme so that people can try them out without having to download and build the mlx-example themselves.

Do you mean point to the community examples in the mlx-lm README?

mzbac commented 9 months ago

Yeah, personally, I think it makes sense. Since the mlx-lm is just a package we want to give people to try out mlx and remove barriers of entry, providing a built-in API would help them run llm via mlx-lm locally and integrate it with their workflow. For example, I am always running a llama.cpp server on my laptop using automator integration for quick grammar correction.

Do you mean point to the community examples in the mlx-lm README? I mean the cool project you repost on Twitter can be included in the mlx-example readme, so people will be aware of those cool tools that the community has built on top of mlx and they can try them out.

awni commented 9 months ago

Yeah, personally, I think it makes sense

Cool, I would be happy to include something like that. Would like to keep it pretty lightweight if possible though.

We could add a CLI like python -m mlx_lm.server which provides essentially the generate API via HTTP. Are you interested in working on that?

I mean the cool project you repost on Twitter can be included in the mlx-example readme,

💯 got it, I like that idea.

mzbac commented 9 months ago

Yeah, personally, I think it makes sense

Cool, I would be happy to include something like that. Would like to keep it pretty lightweight if possible though.

We could add a CLI like python -m mlx_lm.server which provides essentially the generate API via HTTP. Are you interested in working on that?

I mean the cool project you repost on Twitter can be included in the mlx-example readme,

💯 got it, I like that idea.

Yeah, sure thing. I'm more than happy to work on that. :)