Open jwmatthews opened 2 months ago
I'm seeing this consistently with bedrock and updating a big file. In order for the source code diff to actually render appropriately in in the IDE, I need the file in full. So I explicitly added into the prompt that I wanted the updated file in full, and it never has enough room in the response to give it to me.
@dymurray when you access via Bedrock what model did you see issues with? I have used claude 3.5 sonnet and seen issues. To date we've done more testing with llama3 and mixtral and not much with claude 3.5 sonnet.
I have 2 initial thoughts:
I think it's very likely our issue is from not modifying the prompt sufficiently for Claude.
We can likely get more info on the context size by looking at response metadata. I have been working with @devjpt23 and he shared the below.
ai_msg = llm.invoke(messages)
ai_msg.response_metadata['token_usage']['completion_tokens']
Example:
{\\n\\t\' response_metadata={\'token_usage\': {\'completion_tokens\': 738, \'prompt_tokens\': 1122, \'total_tokens\': 1860, \'completion_time\': 1.192732671, \'prompt_time\': 0.056392911, \'queue_time\': 0.0009406290000000053, \'total_time\': 1.249125582}, \'model_name\': \'mixtral-8x7b-32768\', \'system_fingerprint\': \'fp_c5f20b5bb1\', \'finish_reason\': \'stop\', \'logprobs\': None}
I could be mistaken, but I don't think there is any intelligence with returning a response when hitting the token limit. They just return what they finished generating before hitting the limit. In the case of a streaming response they'll just stream until they hit it. It would make sense if this is what is happening.
I could be mistaken, but I don't think there is any intelligence with returning a response when hitting the token limit. They just return what they finished generating before hitting the limit. In the case of a streaming response they'll just stream until they hit it. It would make sense if this is what is happening.
@jmontleon I agree, I had assumed no intelligence and model would stream and get cut off, yet when I saw this the model intentionally omitted code, so it wasn't cut off, it made a choice to strip code out and give me a condensed output.
public ShoppingCartService() {
}
// Rest of the class remains unchanged
private static ShippingServiceRemote lookupShippingServiceRemote() {
try {
final Hashtable<String, String> jndiProperties = new Hashtable<>();
jndiProperties.put(Context.INITIAL_CONTEXT_FACTORY, "org.wildfly.naming.client.WildFlyInitialContextFactory");
final Context context = new InitialContext(jndiProperties);
return (ShippingServiceRemote) context.lookup("ejb:/ROOT/ShippingService!" + ShippingServiceRemote.class.getName());
} catch (NamingException e) {
throw new RuntimeException(e);
}
}
}
I've seen the above behavior, and also just stopping midstream and cutting off.
I have been using
model_id = "meta.llama3-70b-instruct-v1:0"
We were able to find that modifying the config with the following increased the output result with bedrock. I believe @dymurray finally had success with smaller files using this, although results for larger files were still cut off.
[models.args]
model_id = "meta.llama3-70b-instruct-v1:0"
model_kwargs.max_gen_len = 2048
Unfortunately this is the max_gen_len for llama models on bedrock https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-meta.html
Related to #391
I am seeing consistent and repeatable issues with several files in Coolstore when I run against claude 3.5 sonnet. It looks like the output stops suddenly midway through generating an update.
Config:
Error snippet:
Attempting to convert:
prompt:
llm_result (all failures, stops prematurely)
Note on a subsequent retry it failed once more and then succeeded but the contents of what it generated are incomplete/truncated
1 more failure: https://gist.github.com/jwmatthews/7d7aac70a6b69291e2ff0ed2b467debb
Partial Success but Incomplete: https://gist.github.com/jwmatthews/0b366ffa4ff8fe2ed89638552e9972e9 It truncates the response and adds a comment
// Rest of the class remains unchanged