Open maximilienroberti opened 4 months ago
I encountered the same issue with mistral-nemo
but didn't have any problems when using gemini-1.5-flash-001
using local json credentials. For the same project, I was able to use the default mistral-nemo example notebook in Colab Enterprise without getting the quota error, but this is with using my personal admin account credentials and not the role account (using google.auth.transport.requests
and not langchain).
langchain 0.2.5 langchain-core 0.2.29 langchain-google-vertexai 1.0.8 langchain-openai 0.1.8 langchain-text-splitters 0.2.1
@lkuligin has this been resolved in a new release?
@gawbul @jjaeggli Any workaround for that?
@netanelm-upstream nothing from my side, as of yet, other than using a different model. Was hoping to get some response on this from langchain, though 🤔
@maximilienroberti did you get a work around for this?
@lkuligin can this be reopened, please, as we are still experiencing this issue with no resolution that I can see. We are definitely not exceeding our quota limit.
I've looked into this a bit more and it appears in earlier versions of langchain
the text-unicorn
model was adhering to the Online prediction requests per base model per minute per region per base_model
quota, however, in updated versions of langchain it is now utilising the Generate content requests per minute per project per base model per minute per region per base_model
quota instead, which defaults to 0 for everything but the Gemini models, in our account at least. I'm not sure whether it is possible to request a quota limit for that particular one, in the region in question, though I have opened a support request to Google to check.
Is there any change in the code that would have caused this for this particular model? Things seem to work fine for text-bison-32k
and other PaLM2 models?
Google got back to me and stated that the PaLM models are no longer supported, so they won't adjust any quota settings for us. It's strange that this works for text-bison-32k
for us still, though. I see text-bison
is mentioned explicitly in the code and covered as part of the GoogleModelFamily.PALM
class. Is this treated differently from the Gemini models? Would there be any future possibility of supporting text-unicorn
in the code base as part of that too?
Could you try it out whether this fix will fix your problem, please?
@lkuligin Thanks for the reply and for making an update to the code. I posted a comment in your PR, as there is a minor typo (should be unicorn
).
@gawbul is your problem solved?
While running the following script:
It prints the following:
Then returns the error:
Everything works fine for the bison and gemini models. And the requests per minute are far from the default 60 req/m. This issue started to appear on langchain-google-vertexai versions >= 1.0.4.