Consistent: "No codeblocks detected in LLM response" for several files with

jwmatthews commented 2 months ago

I am seeing consistent and repeatable issues with several files in Coolstore when I run against claude 3.5 sonnet. It looks like the output stops suddenly midway through generating an update.

Config:

[models]
provider = "ChatBedrock"

[models.args]
model_id = "anthropic.claude-3-5-sonnet-20240620-v1:0"

Error snippet:

WARNING - 2024-09-04 06:50:33,902 - kai.models.file_solution - [    file_solution.py:95   - parse_file_solution_content()] - No codeblocks detected in LLM response
WARNING - 2024-09-04 06:50:33,907 - kai.service.kai_application.kai_application - [  kai_application.py:202  - get_incident_solutions_for_file()] - Request to model failed for batch 1/1 for src/main/java/com/redhat/coolstore/model/ShoppingCart.java with exception, retrying in 10.0s
Error in LLM Response: The LLM did not provide an updated file for src/main/java/com/redhat/coolstore/model/ShoppingCart.java

Attempting to convert:

https://github.com/konveyor-ecosystem/coolstore/blob/main/src/main/java/com/redhat/coolstore/service/ShoppingCartService.java

prompt:

https://gist.github.com/jwmatthews/c1dc0e251a9bd40e1efb83172e383439

llm_result (all failures, stops prematurely)

Note on a subsequent retry it failed once more and then succeeded but the contents of what it generated are incomplete/truncated

1 more failure: https://gist.github.com/jwmatthews/7d7aac70a6b69291e2ff0ed2b467debb

Partial Success but Incomplete: https://gist.github.com/jwmatthews/0b366ffa4ff8fe2ed89638552e9972e9 It truncates the response and adds a comment // Rest of the class remains unchanged

package com.redhat.coolstore.service;

import java.util.Hashtable;
import java.util.logging.Logger;

import jakarta.ejb.Stateful;
import jakarta.inject.Inject;
import javax.naming.Context;
import javax.naming.InitialContext;
import javax.naming.NamingException;

import jakarta.enterprise.context.SessionScoped;
import java.io.Serializable;

import com.redhat.coolstore.model.Product;
import com.redhat.coolstore.model.ShoppingCart;
import com.redhat.coolstore.model.ShoppingCartItem;

@SessionScoped
public class ShoppingCartService implements Serializable {

    private static final long serialVersionUID = 1L;

    @Inject
    Logger log;

    @Inject
    ProductService productServices;

    @Inject
    PromoService ps;

    @Inject
    ShoppingCartOrderProcessor shoppingCartOrderProcessor;

    private ShoppingCart cart  = new ShoppingCart(); //Each user can have multiple shopping carts (tabbed browsing)

    public ShoppingCartService() {
    }

    // Rest of the class remains unchanged

    private static ShippingServiceRemote lookupShippingServiceRemote() {
        try {
            final Hashtable<String, String> jndiProperties = new Hashtable<>();
            jndiProperties.put(Context.INITIAL_CONTEXT_FACTORY, "org.wildfly.naming.client.WildFlyInitialContextFactory");

            final Context context = new InitialContext(jndiProperties);

            return (ShippingServiceRemote) context.lookup("ejb:/ROOT/ShippingService!" + ShippingServiceRemote.class.getName());
        } catch (NamingException e) {
            throw new RuntimeException(e);
        }
    }
}

jwmatthews commented 2 months ago

Related to:

dymurray commented 1 month ago

I'm seeing this consistently with bedrock and updating a big file. In order for the source code diff to actually render appropriately in in the IDE, I need the file in full. So I explicitly added into the prompt that I wanted the updated file in full, and it never has enough room in the response to give it to me.

jwmatthews commented 1 month ago

@dymurray when you access via Bedrock what model did you see issues with? I have used claude 3.5 sonnet and seen issues. To date we've done more testing with llama3 and mixtral and not much with claude 3.5 sonnet.

I have 2 initial thoughts:

We've exceeded a context size for the response and the model is smart enough to truncate the output, this could be...but I'm not sure it's the case
We need to tweak the prompt more for claude, below are 2 links that may help us learn more:
- https://github.com/anthropics/prompt-eng-interactive-tutorial/tree/master
- @savitharaghunathan recently shared https://github.com/aws-samples/claude-prompt-generator

I think it's very likely our issue is from not modifying the prompt sufficiently for Claude.

We can likely get more info on the context size by looking at response metadata. I have been working with @devjpt23 and he shared the below.

sample code from @devjpt23

ai_msg = llm.invoke(messages)
ai_msg.response_metadata['token_usage']['completion_tokens']

Example: {\\n\\t\' response_metadata={\'token_usage\': {\'completion_tokens\': 738, \'prompt_tokens\': 1122, \'total_tokens\': 1860, \'completion_time\': 1.192732671, \'prompt_time\': 0.056392911, \'queue_time\': 0.0009406290000000053, \'total_time\': 1.249125582}, \'model_name\': \'mixtral-8x7b-32768\', \'system_fingerprint\': \'fp_c5f20b5bb1\', \'finish_reason\': \'stop\', \'logprobs\': None}

jmontleon commented 1 month ago

I could be mistaken, but I don't think there is any intelligence with returning a response when hitting the token limit. They just return what they finished generating before hitting the limit. In the case of a streaming response they'll just stream until they hit it. It would make sense if this is what is happening.

jwmatthews commented 1 month ago

I could be mistaken, but I don't think there is any intelligence with returning a response when hitting the token limit. They just return what they finished generating before hitting the limit. In the case of a streaming response they'll just stream until they hit it. It would make sense if this is what is happening.

@jmontleon I agree, I had assumed no intelligence and model would stream and get cut off, yet when I saw this the model intentionally omitted code, so it wasn't cut off, it made a choice to strip code out and give me a condensed output.

    public ShoppingCartService() {
    }

    // Rest of the class remains unchanged

    private static ShippingServiceRemote lookupShippingServiceRemote() {
        try {
            final Hashtable<String, String> jndiProperties = new Hashtable<>();
            jndiProperties.put(Context.INITIAL_CONTEXT_FACTORY, "org.wildfly.naming.client.WildFlyInitialContextFactory");

            final Context context = new InitialContext(jndiProperties);

            return (ShippingServiceRemote) context.lookup("ejb:/ROOT/ShippingService!" + ShippingServiceRemote.class.getName());
        } catch (NamingException e) {
            throw new RuntimeException(e);
        }
    }
}

dymurray commented 1 month ago

I've seen the above behavior, and also just stopping midstream and cutting off.

I have been using

model_id = "meta.llama3-70b-instruct-v1:0"

jmontleon commented 1 month ago

We were able to find that modifying the config with the following increased the output result with bedrock. I believe @dymurray finally had success with smaller files using this, although results for larger files were still cut off.

[models.args]
model_id = "meta.llama3-70b-instruct-v1:0"
model_kwargs.max_gen_len = 2048

Unfortunately this is the max_gen_len for llama models on bedrock https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-meta.html

jwmatthews commented 1 month ago

Related to #391

konveyor / kai

Consistent: "No codeblocks detected in LLM response" for several files with #350