Split command strings before using the LLM

mrheinen commented 4 days ago

User description

Relying on the llm to provide output for complex command strings turned out to be a bit sketchy. First the LLM doesn't always properly split the command string on ; character into multiple individual commands. Second it seems impossible to limit the size of the output per command. This can result in some commands taking up all tokens and leaving no space for the output of other commands. If commands are send individually to the LLM then it's easier to control the output size per command.

The downside is that we now require multiple calls to the LLM. A future change should make the LLM manager able to do this in parallel.

PR Type

Enhancement, Bug fix

Description

Implemented command splitting functionality to handle multiple commands separately
Updated LLM prompts to work with single commands, improving reliability and control
Added a new utility function SplitCommandsOnSemi to split command strings on semicolons, respecting quotes and escapes
Refactored LLM responder to handle command injection and source code execution separately
Added time measurement for LLM responses in the main command
Included comprehensive unit tests for the new command splitting function
Updated Bazel build configuration to include new files

Changes walkthrough 📝

Relevant files

Enhancement

main.go `Add response time measurement` cmd/llm/main.go Added time measurement for LLM response Modified output format to include elapsed time	+5/-1
llm_prompts.go `Refine LLM prompts for single command handling` pkg/backend/responder/llm_prompts.go Updated command injection prompt to handle single commands Simplified source code execution prompt Removed delimiter-based content separation	+4/-8
llm_responder.go `Implement command splitting and separate handling` pkg/backend/responder/llm_responder.go Implemented command splitting for command injection Added separate handling for command injection and source code execution Removed delimiter-based prompt construction	+19/-8
command.go `Add command splitting utility function` pkg/util/command.go Added new `SplitCommandsOnSemi` function to split commands on semicolons Implemented logic to handle quoted strings and escaped characters	+63/-0

Tests

command_test.go

Add unit tests for command splitting function

pkg/util/command_test.go

Added unit tests for SplitCommandsOnSemi function

Covered various scenarios including simple commands, escaped
characters, and quoted strings

+74/-0

Configuration changes

BUILD.bazel

Update Bazel build file for new command utilities

pkg/util/BUILD.bazel

Added command.go to the go_library target

Added command_test.go to the go_test target

+2/-0

💡 PR-Agent usage: Comment /help "your question" on any pull request to receive relevant information

codiumai-pr-agent-pro[bot] commented 4 days ago

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪

🧪 PR contains tests

🔒 No security concerns identified

⚡ Recommended focus areas for review

Error Handling
The error handling for command injection responses could be improved. Currently, if an error occurs during the LLM request for one command, it returns immediately, potentially ignoring other valid commands. Edge Case Handling
The SplitCommandsOnSemi function might not handle all edge cases correctly, such as nested quotes or complex escape sequences. Additional testing and refinement may be necessary. Prompt Clarity
The updated command injection prompt might be too concise. Consider adding more context or examples to ensure the LLM understands the expected behavior, especially for edge cases.

github-actions[bot] commented 4 days ago

Failed to generate code suggestions for PR

codiumai-pr-agent-pro[bot] commented 4 days ago

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Score
Possible issue	Enhance error handling in the command splitting function to handle edge cases and prevent potential panics ___ Consider adding error handling for edge cases, such as unmatched quotes or incomplete escape sequences. This would make the function more robust and prevent potential panics. [pkg/util/command.go [21-63]](https://github.com/mrheinen/lophiid/pull/69/files#diff-be366637201f3b8d1b28ac2ac240c207641fa1098290d533f0eb7077aabb79e9R21-R63) ```diff -func SplitCommandsOnSemi(commands string) []string { - +func SplitCommandsOnSemi(commands string) ([]string, error) { ret := []string{} stringStart := 0 inQuote := false var inQuoteType byte - for idx := 0; idx < len(commands); idx += 1 { + for idx := 0; idx < len(commands); idx++ { chr := commands[idx] if chr == '\\' { - idx += 1 + if idx == len(commands)-1 { + return nil, fmt.Errorf("incomplete escape sequence at end of input") + } + idx++ continue } if chr == '\'' \|\| chr == '"' { if !inQuote { inQuoteType = chr inQuote = true } else if chr == inQuoteType { inQuote = false } continue } if inQuote { continue } if chr == ';' { cmd := strings.TrimSpace(commands[stringStart:idx]) if cmd != "" { ret = append(ret, cmd) } stringStart = idx + 1 } } + if inQuote { + return nil, fmt.Errorf("unmatched quote in input") + } + cmd := strings.TrimSpace(commands[stringStart:]) if cmd != "" { ret = append(ret, cmd) } - return ret + return ret, nil } ``` - [ ] Apply this suggestion Suggestion importance[1-10]: 7 Why: This suggestion significantly improves the robustness of the SplitCommandsOnSemi function by handling edge cases like unmatched quotes and incomplete escape sequences, which could prevent potential panics.	7
Performance	Use a string builder for more efficient string concatenation when handling multiple commands ___ Consider using a string builder to concatenate the results more efficiently, especially if there might be many commands. [pkg/backend/responder/llm_responder.go [37-51]](https://github.com/mrheinen/lophiid/pull/69/files#diff-c3e93e5adb1e4d4cdb55ce0b82481329eb36fe0db2b23559380252ff22c5969dR37-R51) ```diff -res := "" +var resBuilder strings.Builder var err error switch resType { case constants.ResponderTypeCommandInjection: basePrompt = commandInjectionPrompt for _, pInput := range util.SplitCommandsOnSemi(promptInput) { finalPrompt := fmt.Sprintf(basePrompt, pInput) tmpRes, err := l.llmManager.Complete(finalPrompt) if err != nil { slog.Error("could not complete LLM request", slog.String("error", err.Error())) return strings.Replace(template, LLMReplacementTag, LLMReplacementFallbackString, 1), err } - res += tmpRes + resBuilder.WriteString(tmpRes) } +res := resBuilder.String() ``` - [ ] Apply this suggestion Suggestion importance[1-10]: 5 Why: Using a string builder can improve performance when concatenating multiple strings, especially for a large number of commands. However, the impact may be minimal for small numbers of commands.	5
Performance	Optimize memory allocation for better performance when splitting commands ___ Consider using a more efficient data structure, such as a slice with a pre-allocated capacity, to improve performance when dealing with a large number of commands. [pkg/util/command.go [21-63]](https://github.com/mrheinen/lophiid/pull/69/files#diff-be366637201f3b8d1b28ac2ac240c207641fa1098290d533f0eb7077aabb79e9R21-R63) ```diff func SplitCommandsOnSemi(commands string) []string { + // Estimate initial capacity based on the assumption of an average command length + estimatedCmds := len(commands) / 10 + if estimatedCmds < 1 { + estimatedCmds = 1 + } + ret := make([]string, 0, estimatedCmds) - ret := []string{} stringStart := 0 inQuote := false var inQuoteType byte - for idx := 0; idx < len(commands); idx += 1 { + for idx := 0; idx < len(commands); idx++ { chr := commands[idx] if chr == '\\' { - idx += 1 + idx++ continue } if chr == '\'' \|\| chr == '"' { if !inQuote { inQuoteType = chr inQuote = true } else if chr == inQuoteType { inQuote = false } continue } if inQuote { continue } if chr == ';' { - cmd := strings.TrimSpace(commands[stringStart:idx]) - if cmd != "" { + if cmd := strings.TrimSpace(commands[stringStart:idx]); cmd != "" { ret = append(ret, cmd) } stringStart = idx + 1 } } - cmd := strings.TrimSpace(commands[stringStart:]) - if cmd != "" { + if cmd := strings.TrimSpace(commands[stringStart:]); cmd != "" { ret = append(ret, cmd) } return ret } ``` - [ ] Apply this suggestion Suggestion importance[1-10]: 4 Why: Pre-allocating capacity for the slice can improve performance slightly, especially for large inputs. However, the benefit may be minimal for typical use cases and adds some complexity to the code.	4
Best practice	Handle potential errors from the command splitting function to improve error handling and robustness ___ Consider handling potential errors from `util.SplitCommandsOnSemi(promptInput)`. Although it's unlikely to fail, it's a good practice to check for errors, especially when dealing with user input. [pkg/backend/responder/llm_responder.go [42-51]](https://github.com/mrheinen/lophiid/pull/69/files#diff-c3e93e5adb1e4d4cdb55ce0b82481329eb36fe0db2b23559380252ff22c5969dR42-R51) ```diff -for _, pInput := range util.SplitCommandsOnSemi(promptInput) { +commands, err := util.SplitCommandsOnSemi(promptInput) +if err != nil { + slog.Error("could not split commands", slog.String("error", err.Error())) + return strings.Replace(template, LLMReplacementTag, LLMReplacementFallbackString, 1), err +} +for _, pInput := range commands { finalPrompt := fmt.Sprintf(basePrompt, pInput) tmpRes, err := l.llmManager.Complete(finalPrompt) if err != nil { slog.Error("could not complete LLM request", slog.String("error", err.Error())) return strings.Replace(template, LLMReplacementTag, LLMReplacementFallbackString, 1), err } res += tmpRes } ``` - [ ] Apply this suggestion Suggestion importance[1-10]: 3 Why: While the suggestion improves error handling, the current implementation of SplitCommandsOnSemi doesn't return an error, making this change unnecessary at present. It's a minor improvement in robustness.	3

💡 Need additional feedback ? start a PR chat

mrheinen / lophiid