Open dlqqq opened 1 week ago
After some discussion with @ellisonbg, it seems to make more sense to always default to using CRI in the "us" region area if it is available. This removes the need for additional user input in specifying the region area, and removes the need to handle edge cases of a model supporting CRI in some region areas but not others.
This change will allow models available through CRI to be used from any region. I'll update #1113 accordingly.
We received some valuable feedback from other stakeholders. We concluded that we can't default to the "us" region area as it may violate data residency laws set by GDPR in the EU. Furthermore, having a simple global dropdown for the region area is a poor user experience, as not all models support CRI, and models which support CRI are not necessarily available in all CRI region areas.
Given that this effort will take longer than we had originally estimated, and the fact that v3 development shouldn't be delayed any longer, I will move this issue to the v3 milestone for future work.
As a short-term fix, we will recommend users use the "Bedrock (custom/provisioned)" provider and type the inference profile ID manually to use CRI. I will open a new issue for this.
Description
Cross-region inference (CRI) allows requests to be automatically routed within any set of regions, which mitigates restrictions imposed by service quotas or peak usage times.
CRI is also required to use some models on Amazon Bedrock, notably Llama 3.2. A previous attempt at implementing Llama 3.2 support in Amazon Bedrock was stalled due to lack of existing support for CRI: #1014
Proposed solution
Jupyter AI needs to provide some user interface for supporting CRI. Tentatively, our proposal is to:
us
,us-gov
,eu
,apac
.<region-area>.<model-id>
. When passed to Bedrock APIs, this allows for CRI and allows for usage of Llama 3.2 models on Amazon Bedrock.