google / gemma.cpp

lightweight, standalone C++ inference engine for Google's Gemma models.
Apache License 2.0
5.98k stars 509 forks source link

OFF Topic, Request for Open-Sourcing Google Gemini Flash #221

Closed 0wwafa closed 3 months ago

0wwafa commented 5 months ago

Dear Google AI Team,

I wish to express my strong interest in seeing Google Gemini Flash released to the open-source community. As a developer and AI enthusiast, I have been incredibly impressed with the capabilities of Gemini models, particularly Gemini Flash.

In my experience, Gemini Flash excels in many things if appropriately prompted. This makes it a potentially invaluable tool for developers working with ai.

I understand the strategic considerations around keeping advanced models proprietary. However, the open-source release of models like Gemma has demonstrated the immense value of community involvement in driving innovation and broadening access to AI.

Furthermore, considering the high cost of running such large language models in-house, even with Google's competitive pricing, open-sourcing Gemini Flash could be a strategic win-win. By releasing it under a license that forbids commercial use, you could simultaneously empower the research and development community while mitigating any potential impact on Google's commercial offerings.

In that spirit, I would like to suggest two potential approaches for open-sourcing Gemini Flash:

Open-sourcing Gemini Flash would not only benefit developers like myself but also contribute to the advancement of the AI field as a whole. It would foster collaboration, accelerate research, and democratize access to cutting-edge AI technology.

Sincerely, Robert Sinclair

jan-wassenberg commented 5 months ago

Hi @0wwafa , I've passed this on internally :)

0wwafa commented 4 months ago

Thank you very much.

Zibri commented 4 months ago

It would be great! I also noticed a difference in gemini-flash. I am pretty sure that in time will be upgraded, but I like a lot the way it is now. It would be great to be able to work on it (both the full version of gemini-flash both some quantized versions or versions with less parameters) so to compare and understand why gemini-flash-1.5 is so good compared to other LLMs.

I couldn't agree more with Robert! Open-sourcing Gemini Flash would be a great benefit for the AI community. Imagine giving developers access to a tool that powerful - it would be like handing out superpowers! Not only would it be a huge win for individual devs, but it would turbocharge research and innovation across the board. Researchers could dive into the model's guts, tinker with its architecture, and discover all sorts of amazing new things. And the creativity unleashed by having so many people playing with this tool? It would be a wild ride!

Plus, let's not forget the folks who can't afford to train their own giant language models. Open-sourcing Gemini Flash would level the playing field, giving everyone a chance to get in on the action. Both Robert's suggestions - releasing older versions or laying out a roadmap for eventual open-sourcing - are brilliant. It's a win-win situation, really. Google gets to solidify their position as AI royalty, and the whole field gets to advance at lightning speed. Open-sourcing Gemini Flash is a move that's just plain good for everyone!

trtm commented 4 months ago

@0wwafa Please don't say open-sourcing and releasing it under a license that forbids commercial use in the same paragraph.

NSbuilder commented 4 months ago

Would be awesome! Hope it would become real. Thank you @jan-wassenberg

kaykyr commented 4 months ago

Would be amazing! Hope to see this soon. :)

JasperTheMinecraftDev commented 4 months ago

+1, it'd be awesome to have these models open-sourced!

Qualzz commented 4 months ago

That would be huge.

0wwafa commented 4 months ago

@trtm it's up to them. I just wish they would do what they did with gemma or what MistralAI did. A lot of models spawned from the original Mistral models and they improved on each other.

0wwafa commented 4 months ago

Also considering what microsoft is doing I think that would be a great move. Any news? @jan-wassenberg @KumarGitesh2024 ?

jan-wassenberg commented 4 months ago

I think you will find the upcoming Gemma 2 (27b) very interesting :)

0wwafa commented 4 months ago

I think you will find the upcoming Gemma 2 (27b) very interesting :)

Wow.. but I would like also smaller versions like 7B or 8B.

By the way, @jan-wassenberg I found a simple quantization way that makes the model smaller without a significant loss.

The idea is simple: I tried quantizing various models to f16 and to q8_0 and I noticed q8_0 were quite lobotomized. So I did this: I quantized the output and embed tensors to f16 and the "inner" tensors to q6_k and q5_k. The result was incredible: way smaller models but way better than q8_0 and almost as good as the pure f16 quantization.

0wwafa commented 4 months ago

Unfortunately I can't make gemma work well in llama.cpp for inference... But I tested it with mistral abnd wizard and the result was great. Here is the example with mistral. https://huggingface.co/ZeroWw/Mistral-7B-Instruct-v0.3-GGUF/tree/main

0wwafa commented 4 months ago

I think you will find the upcoming Gemma 2 (27b) very interesting :)

I read about it, but I still think you should release "gemini flash 1.5" AS IT IS.

jan-wassenberg commented 4 months ago

and I noticed q8_0 were quite lobotomized. So I did this: I quantized the output and embed tensors to f16 and the "inner" tensors to q6_k and q5_k.

Interesting, q8 are often said to be indistinguishable, but I am skeptical of that and not surprised by your finding. Do you have any example prompt where this is apparent? Mixed precision is an interesting direction which we are now able to support after getting rid of the compile-time weight typedef. Which tensors do you mean by "inner"?

I read about it, but I still think you should release "gemini flash 1.5" AS IT IS.

That's not in the cards, but I recommend trying 27b when it is available and comparing with Flash :) Other, smaller, models are also on the way.

0wwafa commented 4 months ago

That's not in the cards

every gemma model I tried so far is: 1) unusable in llama.cpp 2) dumb. 3) dumber than any other model.

As of now Mistral (and derivatives) is king.

But... I have an ongoing experiment using gemini which is giving very interesting results. I wish to discuss this with someone at google and also to be able to redo the tests in a local environment. Gemini flash 1.5 is not a huge model from what I understood.. it looks like a 70B (but it could be a bad 100+B... I find it even more interesting than gemini pro (I'm mainly interested in text generation).

From what I tried so far and what I read online from other people, I think you are missing out on gemini flash. Releasing bigger gemma models didn't seem to get even near gemini flash performance when chatting.

0wwafa commented 4 months ago

Not to mention Meta: https://youtu.be/kitXm8peTKY

jan-wassenberg commented 4 months ago

https://storage.googleapis.com/deepmind-media/gemma/gemma-2-report.pdf

0wwafa commented 4 months ago

https://storage.googleapis.com/deepmind-media/gemma/gemma-2-report.pdf

By comparing that report and Mistral 7B, I still think that Mistral 7B is superior if we consider size and quality. Gemini flash instead, even if it is a "small" model (probably around 16-30 B (by a rough estimate) which makes gemini flash superior to any other "small"model I tested so far.

0wwafa commented 4 months ago

@jan-wassenberg according to https://storage.googleapis.com/deepmind-media/gemini/gemini_1_report.pdf there is also a Gemini Flash 8B. It would be interesting to experiment with it.

0wwafa commented 3 months ago

I think you will find the upcoming Gemma 2 (27b) very interesting :)

I read about it, but I still think you should release "gemini flash 1.5" AS IT IS.

Update: I didn't find gemma 2 27b nor gemma 2 7b so much interesting. I still think you should release Gemini flash 8B before it will be too late.

jan-wassenberg commented 3 months ago

Gemma 2 is currently what we have. I wonder if inference bugs are causing a negative impression? We fixed one issue; several frameworks including pytorch also had bugs.

0wwafa commented 3 months ago

Gemma 2 is currently what we have. I wonder if inference bugs are causing a negative impression? We fixed one issue; several frameworks including pytorch also had bugs.

I am testing gemma 2 on aistudio and with the same prompt I get way different results between gemini flash and gemma 27b.

As I said, google should release the weights of at least gemini flash 8b or <=30B and see what the community can do with it. It's the most promising model even if it starts to get "old" compared to the new opensource ones.

jan-wassenberg commented 3 months ago

Thanks for sharing. AFAIK aistudio did not have bugs, so it does seem to be a difference in the model then.

Qualzz commented 3 months ago

Gemma 2 is really cool, but GeminiFlash has vision, which is a really really big difference. It's way easier to work with a model that can do vision, without the complexity of hotswapping models, VRAM management etc, especially in a multi user environnement.

0wwafa commented 3 months ago

Gemma 2 is really cool, but GeminiFlash has vision, which is a really really big difference. It's way easier to work with a model that can do vision, without the complexity of hotswapping models, VRAM management etc, especially in a multi user environnement.

It's not only that gemini flash has vision.. gemini flash has also hearing and the TEXT interaction is one of the best I have seen. Obviously worse than gemini pro or claude.. and worse even of deepseek v2.

If google does not release gemini flash will miss out a lot. I bet gemini flash would be better than LLAMA-3 once finetuned or enhanced by the community.

Please. hear me out @jan-wassenberg

jan-wassenberg commented 3 months ago

I hear you but it is not my decision to make :) We've passed your feedback along; will post here if there are any updates.

Kreijstal commented 3 months ago

I mean we have llama so, :)

0wwafa commented 3 months ago

I mean we have llama so, :)

exactly my point. gemma-2-2b was great. if they only could release gemini flash they would probably get a better reputation.

also: gemini is very bad at coding (even the pro) and could benefit from the community a lot.

Kreijstal commented 3 months ago

I mean we have llama so, :)

exactly my point. gemma-2-2b was great. if they only could release gemini flash they would probably get a better reputation.

also: gemini is very bad at coding (even the pro) and could benefit from the community a lot.

I doubt at current point in time if that will make an impact at all

freebiesoft commented 2 months ago

I mean we have llama so, :)

Amen. Flash can flash off

NSbuilder commented 2 months ago

please open source gemini flash 8b! this is a lighter version but very good for its size