mrheinen / lophiid

A distributed honeypot for monitoring large scale web attacks
GNU General Public License v2.0
6 stars 1 forks source link

Split command strings before using the LLM #69

Closed mrheinen closed 4 days ago

mrheinen commented 4 days ago

User description

Relying on the llm to provide output for complex command strings turned out to be a bit sketchy. First the LLM doesn't always properly split the command string on ; character into multiple individual commands. Second it seems impossible to limit the size of the output per command. This can result in some commands taking up all tokens and leaving no space for the output of other commands. If commands are send individually to the LLM then it's easier to control the output size per command.

The downside is that we now require multiple calls to the LLM. A future change should make the LLM manager able to do this in parallel.


PR Type

Enhancement, Bug fix


Description


Changes walkthrough πŸ“

Relevant files
Enhancement
main.go
Add response time measurement                                                       

cmd/llm/main.go
  • Added time measurement for LLM response
  • Modified output format to include elapsed time
  • +5/-1     
    llm_prompts.go
    Refine LLM prompts for single command handling                     

    pkg/backend/responder/llm_prompts.go
  • Updated command injection prompt to handle single commands
  • Simplified source code execution prompt
  • Removed delimiter-based content separation
  • +4/-8     
    llm_responder.go
    Implement command splitting and separate handling               

    pkg/backend/responder/llm_responder.go
  • Implemented command splitting for command injection
  • Added separate handling for command injection and source code
    execution
  • Removed delimiter-based prompt construction
  • +19/-8   
    command.go
    Add command splitting utility function                                     

    pkg/util/command.go
  • Added new SplitCommandsOnSemi function to split commands on semicolons
  • Implemented logic to handle quoted strings and escaped characters
  • +63/-0   
    Tests
    command_test.go
    Add unit tests for command splitting function                       

    pkg/util/command_test.go
  • Added unit tests for SplitCommandsOnSemi function
  • Covered various scenarios including simple commands, escaped
    characters, and quoted strings
  • +74/-0   
    Configuration changes
    BUILD.bazel
    Update Bazel build file for new command utilities               

    pkg/util/BUILD.bazel
  • Added command.go to the go_library target
  • Added command_test.go to the go_test target
  • +2/-0     

    πŸ’‘ PR-Agent usage: Comment /help "your question" on any pull request to receive relevant information

    codiumai-pr-agent-pro[bot] commented 4 days ago

    PR Reviewer Guide πŸ”

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 3 πŸ”΅πŸ”΅πŸ”΅βšͺβšͺ
    πŸ§ͺ PR contains tests
    πŸ”’ No security concerns identified
    ⚑ Recommended focus areas for review

    Error Handling
    The error handling for command injection responses could be improved. Currently, if an error occurs during the LLM request for one command, it returns immediately, potentially ignoring other valid commands. Edge Case Handling
    The SplitCommandsOnSemi function might not handle all edge cases correctly, such as nested quotes or complex escape sequences. Additional testing and refinement may be necessary. Prompt Clarity
    The updated command injection prompt might be too concise. Consider adding more context or examples to ensure the LLM understands the expected behavior, especially for edge cases.
    github-actions[bot] commented 4 days ago

    Failed to generate code suggestions for PR

    codiumai-pr-agent-pro[bot] commented 4 days ago

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Score
    Possible issue
    Enhance error handling in the command splitting function to handle edge cases and prevent potential panics ___ **Consider adding error handling for edge cases, such as unmatched quotes or
    incomplete escape sequences. This would make the function more robust and prevent
    potential panics.** [pkg/util/command.go [21-63]](https://github.com/mrheinen/lophiid/pull/69/files#diff-be366637201f3b8d1b28ac2ac240c207641fa1098290d533f0eb7077aabb79e9R21-R63) ```diff -func SplitCommandsOnSemi(commands string) []string { - +func SplitCommandsOnSemi(commands string) ([]string, error) { ret := []string{} stringStart := 0 inQuote := false var inQuoteType byte - for idx := 0; idx < len(commands); idx += 1 { + for idx := 0; idx < len(commands); idx++ { chr := commands[idx] if chr == '\\' { - idx += 1 + if idx == len(commands)-1 { + return nil, fmt.Errorf("incomplete escape sequence at end of input") + } + idx++ continue } if chr == '\'' || chr == '"' { if !inQuote { inQuoteType = chr inQuote = true } else if chr == inQuoteType { inQuote = false } continue } if inQuote { continue } if chr == ';' { cmd := strings.TrimSpace(commands[stringStart:idx]) if cmd != "" { ret = append(ret, cmd) } stringStart = idx + 1 } } + if inQuote { + return nil, fmt.Errorf("unmatched quote in input") + } + cmd := strings.TrimSpace(commands[stringStart:]) if cmd != "" { ret = append(ret, cmd) } - return ret + return ret, nil } ``` - [ ] **Apply this suggestion**
    Suggestion importance[1-10]: 7 Why: This suggestion significantly improves the robustness of the SplitCommandsOnSemi function by handling edge cases like unmatched quotes and incomplete escape sequences, which could prevent potential panics.
    7
    Performance
    Use a string builder for more efficient string concatenation when handling multiple commands ___ **Consider using a string builder to concatenate the results more efficiently,
    especially if there might be many commands.** [pkg/backend/responder/llm_responder.go [37-51]](https://github.com/mrheinen/lophiid/pull/69/files#diff-c3e93e5adb1e4d4cdb55ce0b82481329eb36fe0db2b23559380252ff22c5969dR37-R51) ```diff -res := "" +var resBuilder strings.Builder var err error switch resType { case constants.ResponderTypeCommandInjection: basePrompt = commandInjectionPrompt for _, pInput := range util.SplitCommandsOnSemi(promptInput) { finalPrompt := fmt.Sprintf(basePrompt, pInput) tmpRes, err := l.llmManager.Complete(finalPrompt) if err != nil { slog.Error("could not complete LLM request", slog.String("error", err.Error())) return strings.Replace(template, LLMReplacementTag, LLMReplacementFallbackString, 1), err } - res += tmpRes + resBuilder.WriteString(tmpRes) } +res := resBuilder.String() ``` - [ ] **Apply this suggestion**
    Suggestion importance[1-10]: 5 Why: Using a string builder can improve performance when concatenating multiple strings, especially for a large number of commands. However, the impact may be minimal for small numbers of commands.
    5
    Optimize memory allocation for better performance when splitting commands ___ **Consider using a more efficient data structure, such as a slice with a pre-allocated
    capacity, to improve performance when dealing with a large number of commands.** [pkg/util/command.go [21-63]](https://github.com/mrheinen/lophiid/pull/69/files#diff-be366637201f3b8d1b28ac2ac240c207641fa1098290d533f0eb7077aabb79e9R21-R63) ```diff func SplitCommandsOnSemi(commands string) []string { + // Estimate initial capacity based on the assumption of an average command length + estimatedCmds := len(commands) / 10 + if estimatedCmds < 1 { + estimatedCmds = 1 + } + ret := make([]string, 0, estimatedCmds) - ret := []string{} stringStart := 0 inQuote := false var inQuoteType byte - for idx := 0; idx < len(commands); idx += 1 { + for idx := 0; idx < len(commands); idx++ { chr := commands[idx] if chr == '\\' { - idx += 1 + idx++ continue } if chr == '\'' || chr == '"' { if !inQuote { inQuoteType = chr inQuote = true } else if chr == inQuoteType { inQuote = false } continue } if inQuote { continue } if chr == ';' { - cmd := strings.TrimSpace(commands[stringStart:idx]) - if cmd != "" { + if cmd := strings.TrimSpace(commands[stringStart:idx]); cmd != "" { ret = append(ret, cmd) } stringStart = idx + 1 } } - cmd := strings.TrimSpace(commands[stringStart:]) - if cmd != "" { + if cmd := strings.TrimSpace(commands[stringStart:]); cmd != "" { ret = append(ret, cmd) } return ret } ``` - [ ] **Apply this suggestion**
    Suggestion importance[1-10]: 4 Why: Pre-allocating capacity for the slice can improve performance slightly, especially for large inputs. However, the benefit may be minimal for typical use cases and adds some complexity to the code.
    4
    Best practice
    Handle potential errors from the command splitting function to improve error handling and robustness ___ **Consider handling potential errors from util.SplitCommandsOnSemi(promptInput).
    Although it's unlikely to fail, it's a good practice to check for errors, especially
    when dealing with user input.** [pkg/backend/responder/llm_responder.go [42-51]](https://github.com/mrheinen/lophiid/pull/69/files#diff-c3e93e5adb1e4d4cdb55ce0b82481329eb36fe0db2b23559380252ff22c5969dR42-R51) ```diff -for _, pInput := range util.SplitCommandsOnSemi(promptInput) { +commands, err := util.SplitCommandsOnSemi(promptInput) +if err != nil { + slog.Error("could not split commands", slog.String("error", err.Error())) + return strings.Replace(template, LLMReplacementTag, LLMReplacementFallbackString, 1), err +} +for _, pInput := range commands { finalPrompt := fmt.Sprintf(basePrompt, pInput) tmpRes, err := l.llmManager.Complete(finalPrompt) if err != nil { slog.Error("could not complete LLM request", slog.String("error", err.Error())) return strings.Replace(template, LLMReplacementTag, LLMReplacementFallbackString, 1), err } res += tmpRes } ``` - [ ] **Apply this suggestion**
    Suggestion importance[1-10]: 3 Why: While the suggestion improves error handling, the current implementation of SplitCommandsOnSemi doesn't return an error, making this change unnecessary at present. It's a minor improvement in robustness.
    3

    πŸ’‘ Need additional feedback ? start a PR chat