mrheinen / lophiid

A distributed honeypot for monitoring large scale web attacks
GNU General Public License v2.0
6 stars 1 forks source link

Allow concurrent LLM lookups #70

Closed mrheinen closed 2 days ago

mrheinen commented 2 days ago

PR Type

Enhancement


Description


PRDescriptionHeader.CHANGES_WALKTHROUGH

Relevant files
Enhancement
5 files
backend_main.go
Add concurrent requests parameter to LLMManager                   

cmd/backend/backend_main.go
  • Updated NewLLMManager call to include a new parameter
    cfg.Responder.LLMConcurrentRequests
  • +1/-1     
    main.go
    Add concurrent requests to LLMManager in main                       

    cmd/llm/main.go
  • Modified NewLLMManager call to include a new parameter for concurrent
    requests (set to 5)
  • +1/-1     
    llm_prompts.go
    Refine command injection prompt                                                   

    pkg/backend/responder/llm_prompts.go
  • Updated the command injection prompt to be more specific about
    command-line commands
  • Removed instructions about $? substring and echo behavior
  • +2/-5     
    llm_responder.go
    Implement concurrent LLM lookups for command injection     

    pkg/backend/responder/llm_responder.go
  • Implemented concurrent LLM lookups for command injection responses
  • Added error handling for cases with no commands
  • Used CompleteMultiple method from LLMManager for parallel processing
  • +22/-8   
    llm_manager.go
    Implement concurrent LLM completions in LLMManager             

    pkg/llm/llm_manager.go
  • Added multiplePoolSize field to LLMManager struct
  • Implemented CompleteMultiple method for parallel prompt completions
  • Updated NewLLMManager to include pool size parameter
  • +30/-2   
    Configuration changes
    4 files
    config.go
    Add LLMConcurrentRequests to Responder config                       

    pkg/backend/config.go
  • Added new field LLMConcurrentRequests to the Responder struct
  • Set default value for LLMConcurrentRequests to 5
  • +7/-6     
    backend-config.yaml
    Add concurrent requests config option                                       

    config/backend-config.yaml
  • Added new configuration option llm_concurrent_requests with default
    value 5
  • +2/-0     
    BUILD.bazel
    Update Bazel build for responder tests                                     

    pkg/backend/responder/BUILD.bazel - Added new dependencies for responder tests
    +6/-1     
    BUILD.bazel
    Update Bazel build for LLM package                                             

    pkg/llm/BUILD.bazel - Added new dependency `@com_github_sourcegraph_conc//pool`
    +1/-0     
    Tests
    2 files
    llm_responder_test.go
    Add tests for command injection responder                               

    pkg/backend/responder/llm_responder_test.go
  • Added new test cases for command injection responder
  • Implemented tests for single and multiple command completions
  • +55/-0   
    llm_manager_test.go
    Add tests for concurrent LLM completions                                 

    pkg/llm/llm_manager_test.go
  • Updated existing tests to include new pool size parameter
  • Added new test case for CompleteMultiple method
  • +29/-2   
    Dependencies
    3 files
    deps.bzl
    Add concurrent library dependency                                               

    deps.bzl
  • Added new dependency com_github_sourcegraph_conc
  • Updated version of org_uber_go_multierr
  • +9/-2     
    go.mod
    Add concurrent library to go.mod                                                 

    go.mod - Added new dependency `github.com/sourcegraph/conc v0.3.0`
    +1/-0     
    go.sum
    Update go.sum with new dependency                                               

    go.sum
  • Added checksum for new dependency github.com/sourcegraph/conc v0.3.0
  • +2/-0     

    πŸ’‘ PR-Agent usage: Comment /help "your question" on any pull request to receive relevant information

    codiumai-pr-agent-pro[bot] commented 2 days ago

    PR Reviewer Guide πŸ”

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 4 πŸ”΅πŸ”΅πŸ”΅πŸ”΅βšͺ
    πŸ§ͺ PR contains tests
    πŸ”’ Security concerns

    Command Injection:
    The changes to the command injection prompt in 'pkg/backend/responder/llm_prompts.go' might make it less restrictive. This could potentially allow for command injection if not properly sanitized or validated. Careful review of the prompt and its usage is necessary to ensure it doesn't introduce security vulnerabilities.
    ⚑ Recommended focus areas for review

    Potential Race Condition
    The concurrent execution of LLM requests might lead to race conditions when writing to the shared 'res' variable. Consider using a thread-safe approach for combining results. Error Handling
    The CompleteMultiple function returns early on the first error encountered. Consider collecting all errors and returning them along with successful results. Prompt Security
    The updated command injection prompt might be less restrictive. Ensure it doesn't allow for potential security vulnerabilities through command execution.
    github-actions[bot] commented 2 days ago

    Failed to generate code suggestions for PR

    codiumai-pr-agent-pro[bot] commented 2 days ago

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Score
    Enhancement
    Add a timeout to the parallel completion process ___ **Consider adding a timeout or context with deadline to the CompleteMultiple call to
    ensure it doesn't run indefinitely if one of the completions takes too long.** [pkg/backend/responder/llm_responder.go [49-70]](https://github.com/mrheinen/lophiid/pull/70/files#diff-c3e93e5adb1e4d4cdb55ce0b82481329eb36fe0db2b23559380252ff22c5969dR49-R70) ```diff func (l *LLMManager) CompleteMultiple(prompts []string) (map[string]string, error) { + ctx, cancel := context.WithTimeout(context.Background(), l.completionTimeout) + defer cancel() + result := map[string]string{} - p := pool.New().WithErrors().WithMaxGoroutines(l.multiplePoolSize) + p := pool.New().WithErrors().WithMaxGoroutines(l.multiplePoolSize).WithContext(ctx) for _, prompt := range prompts { p.Go(func() error { localPrompt := prompt ret, err := l.Complete(localPrompt) if err != nil { return err } result[localPrompt] = ret return nil }) } err := p.Wait() return result, err } ``` - [ ] **Apply this suggestion**
    Suggestion importance[1-10]: 8 Why: Adding a timeout to the CompleteMultiple function is crucial for preventing potential deadlocks or long-running operations. This significantly improves the reliability and responsiveness of the system.
    8
    Improve error handling for cases with no commands ___ **Consider handling the case where no commands are found more gracefully. Instead of
    returning early with a fallback string, you could log a warning and continue
    processing with an empty result.** [pkg/backend/responder/llm_responder.go [43-47]](https://github.com/mrheinen/lophiid/pull/70/files#diff-c3e93e5adb1e4d4cdb55ce0b82481329eb36fe0db2b23559380252ff22c5969dR43-R47) ```diff commands := util.SplitCommandsOnSemi(promptInput) if len(commands) == 0 { - slog.Debug("no commands found", slog.String("input", promptInput)) - return strings.Replace(template, LLMReplacementTag, LLMReplacementFallbackString, 1), nil + slog.Warn("no commands found, continuing with empty result", slog.String("input", promptInput)) + commands = []string{""} } ``` - [ ] **Apply this suggestion**
    Suggestion importance[1-10]: 6 Why: The suggestion improves error handling by continuing execution with an empty command instead of returning early. This enhances robustness and provides more flexibility in handling edge cases.
    6
    Possible issue
    βœ… Use a thread-safe map for concurrent result storage ___ **Consider using a sync.Map instead of a regular map for the result in
    CompleteMultiple to avoid potential race conditions when writing results
    concurrently.** [pkg/llm/llm_manager.go [49-70]](https://github.com/mrheinen/lophiid/pull/70/files#diff-079422eef64450f8a3d97a65b202d2a89a3850d6dd9282c9a8f930bd3731c26aR49-R70) ```diff func (l *LLMManager) CompleteMultiple(prompts []string) (map[string]string, error) { - result := map[string]string{} + var result sync.Map p := pool.New().WithErrors().WithMaxGoroutines(l.multiplePoolSize) for _, prompt := range prompts { p.Go(func() error { localPrompt := prompt ret, err := l.Complete(localPrompt) if err != nil { return err } - result[localPrompt] = ret + result.Store(localPrompt, ret) return nil }) } err := p.Wait() - return result, err + finalResult := make(map[string]string) + result.Range(func(key, value interface{}) bool { + finalResult[key.(string)] = value.(string) + return true + }) + + return finalResult, err } ``` `[Suggestion has been applied]`
    Suggestion importance[1-10]: 7 Why: Using a sync.Map instead of a regular map eliminates potential race conditions in concurrent operations. This enhances the thread safety and reliability of the code, especially in a multi-threaded environment.
    7
    Improve error handling for empty or missing LLM responses ___ **Consider adding error handling for the case where resMap[prompt] is empty or
    invalid. This could help prevent potential issues if the LLM returns unexpected
    results.** [pkg/backend/responder/llm_responder.go [60-65]](https://github.com/mrheinen/lophiid/pull/70/files#diff-c3e93e5adb1e4d4cdb55ce0b82481329eb36fe0db2b23559380252ff22c5969dR60-R65) ```diff for _, prompt := range promptInputs { val, ok := resMap[prompt] - if ok { + if ok && val != "" { res += val + } else { + slog.Warn("Empty or missing response for prompt", slog.String("prompt", prompt)) } } ``` - [ ] **Apply this suggestion**
    Suggestion importance[1-10]: 5 Why: The suggestion adds a check for empty responses and logs a warning, which improves error handling and debugging capabilities. While useful, it's a minor enhancement compared to the other suggestions.
    5

    πŸ’‘ Need additional feedback ? start a PR chat

    codiumai-pr-agent-pro[bot] commented 2 days ago

    CI Failure Feedback 🧐

    **Action:** build
    **Failed stage:** [Build](https://github.com/mrheinen/lophiid/actions/runs/11581028370/job/32240847919) [❌]
    **Failure summary:** The action failed due to a compilation error in the Go code:
  • The error occurred in the file pkg/llm/llm_manager.go at line 50, column 16.
  • The specific error is "undefined: sync", indicating that the sync package is being used but not
    imported.
  • This compilation error prevented the successful building of the pkg/llm/llm_test.internal.a package.
  • The error occurred during the GoCompilePkg step of the build process.
  • Relevant error logs: ```yaml 1: ##[group]Operating System 2: Ubuntu ... 574: [145 / 338] GoCompilePkg external/com_github_jackc_pgx_v5/pgtype/pgtype.a; 3s linux-sandbox ... (4 actions, 3 running) 575: [157 / 338] GoCompilePkg external/com_github_jackc_pgx_v5/pgtype/pgtype.a; 4s linux-sandbox ... (4 actions, 3 running) 576: [186 / 338] [Prepa] GoCompilePkg pkg/database/database_test~testmain.a ... (4 actions, 3 running) 577: [212 / 338] GoCompilePkg pkg/database/database.a; 0s linux-sandbox ... (4 actions, 3 running) 578: [226 / 338] GoCompilePkg external/org_golang_google_protobuf/internal/impl/impl.a; 0s linux-sandbox ... (4 actions running) 579: [234 / 338] GoCompilePkg external/org_golang_x_text/collate/collate.a; 1s linux-sandbox ... (4 actions running) 580: [257 / 338] GoCompilePkg external/org_golang_x_text/collate/collate.a; 2s linux-sandbox ... (4 actions running) 581: [277 / 338] GoCompilePkg external/org_golang_google_grpc/internal/transport/transport.a; 0s linux-sandbox ... (4 actions, 3 running) 582: ERROR: /home/runner/work/lophiid/lophiid/pkg/llm/BUILD.bazel:21:8: GoCompilePkg pkg/llm/llm_test.internal.a failed: (Exit 1): builder failed: error executing GoCompilePkg command (from target //pkg/llm:llm_test) bazel-out/k8-opt-exec-ST-d57f47055a04/bin/external/go_sdk/builder_reset/builder compilepkg -sdk external/go_sdk -installsuffix linux_amd64 -src pkg/llm/llm.go -src pkg/llm/llm_manager.go -src ... (remaining 37 arguments skipped) 583: Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging 584: ##[error]pkg/llm/llm_manager.go:50:16: undefined: sync 585: compilepkg: error running subcommand external/go_sdk/pkg/tool/linux_amd64/compile: exit status 2 586: Use --verbose_failures to see the command lines of failed build steps. 587: INFO: Elapsed time: 65.047s, Critical Path: 45.59s 588: INFO: 285 processes: 59 internal, 226 linux-sandbox. 589: ERROR: Build did NOT complete successfully 590: ##[error]Process completed with exit code 1. ```

    ✨ CI feedback usage guide:
    The CI feedback tool (`/checks)` automatically triggers when a PR has a failed check. The tool analyzes the failed checks and provides several feedbacks: - Failed stage - Failed test name - Failure summary - Relevant error logs In addition to being automatically triggered, the tool can also be invoked manually by commenting on a PR: ``` /checks "https://github.com/{repo_name}/actions/runs/{run_number}/job/{job_number}" ``` where `{repo_name}` is the name of the repository, `{run_number}` is the run number of the failed check, and `{job_number}` is the job number of the failed check. #### Configuration options - `enable_auto_checks_feedback` - if set to true, the tool will automatically provide feedback when a check is failed. Default is true. - `excluded_checks_list` - a list of checks to exclude from the feedback, for example: ["check1", "check2"]. Default is an empty list. - `enable_help_text` - if set to true, the tool will provide a help message with the feedback. Default is true. - `persistent_comment` - if set to true, the tool will overwrite a previous checks comment with the new feedback. Default is true. - `final_update_message` - if `persistent_comment` is true and updating a previous checks message, the tool will also create a new message: "Persistent checks updated to latest commit". Default is true. See more information about the `checks` tool in the [docs](https://pr-agent-docs.codium.ai/tools/ci_feedback/).