mrheinen / lophiid

A distributed honeypot for monitoring large scale web attacks
GNU General Public License v2.0
6 stars 1 forks source link

AI integration #53

Closed mrheinen closed 3 weeks ago

mrheinen commented 3 weeks ago

User description

Add support for LLM integration in both rules and content scripts.


PR Type

Enhancement


Description


Changes walkthrough ๐Ÿ“

Relevant files
Configuration changes
2 files
api_server.go
Update API server initialization                                                 

cmd/api/api_server.go
  • Updated NewGojaJavascriptRunner call to include a nil responder
    parameter
  • +1/-1     
    config.go
    Add LLM responder configuration                                                   

    pkg/backend/config.go - Added Responder configuration struct with LLM-related settings
    +8/-0     
    Enhancement
    7 files
    backend_main.go
    Integrate LLM responder in backend                                             

    cmd/backend/backend_main.go
  • Added LLM responder initialization and configuration
  • Updated NewGojaJavascriptRunner and NewBackendServer calls to include
    LLM responder
  • +17/-2   
    main.go
    Add LLM command-line utility                                                         

    cmd/llm/main.go
  • Added new file for LLM command-line utility
  • Implements functionality to test LLM responder with different
    configurations
  • +58/-0   
    backend.go
    Integrate LLM responder in backend server                               

    pkg/backend/backend.go
  • Added LLM responder to BackendServer struct
  • Updated NewBackendServer to include LLM responder
  • Implemented getResponderData method for LLM integration
  • +50/-2   
    extractors.go
    Remove StringsFromRequest and related functions                   

    pkg/backend/extractors/extractors.go
  • Removed StringsFromRequest and related functions
  • Removed unused imports
  • +0/-148 
    llm_prompts.go
    Add LLM prompt templates                                                                 

    pkg/backend/responder/llm_prompts.go
  • Added new file with LLM prompt templates for command injection and
    source code execution
  • +36/-0   
    llm_responder.go
    Implement LLM responder                                                                   

    pkg/backend/responder/llm_responder.go - Implemented LLMResponder struct and methods for LLM integration
    +45/-0   
    responder.go
    Add Responder interface and constants                                       

    pkg/backend/responder/responder.go
  • Added Responder interface and FakeResponder for testing
  • Defined constants for LLM replacement tags
  • +19/-0   
    Error handling
    1 files
    server.go
    Add responder regex validation                                                     

    pkg/api/server.go
  • Added validation for responder regex in HandleUpsertSingleContentRule
  • +9/-0     
    Miscellaneous
    5 files
    base64_extractor.go
    Update import for StringsFromRequest                                         

    pkg/backend/extractors/base64_extractor.go - Updated import path for StringsFromRequest function
    +2/-1     
    nc_extractor.go
    Update import for StringsFromRequest                                         

    pkg/backend/extractors/nc_extractor.go - Updated import path for StringsFromRequest function
    +2/-2     
    tcp_extractor.go
    Update import for StringsFromRequest                                         

    pkg/backend/extractors/tcp_extractor.go - Updated import path for StringsFromRequest function
    +2/-1     
    unicode_extractor.go
    Update import for StringsFromRequest                                         

    pkg/backend/extractors/unicode_extractor.go - Updated import path for StringsFromRequest function
    +2/-1     
    url_extractor.go
    Update import for StringsFromRequest                                         

    pkg/backend/extractors/url_extractor.go - Updated import path for StringsFromRequest function
    +2/-1     

    ๐Ÿ’ก PR-Agent usage: Comment /help "your question" on any pull request to receive relevant information

    codiumai-pr-agent-pro[bot] commented 3 weeks ago

    PR Reviewer Guide ๐Ÿ”

    Here are some key observations to aid the review process:

    โฑ๏ธ Estimated effort to review: 4 ๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ตโšช
    ๐Ÿงช No relevant tests
    ๐Ÿ”’ Security concerns

    Sensitive information exposure:
    The LLM integration could potentially expose sensitive information if the prompts or responses contain any confidential data. The caching mechanism (pkg/llm/llm_manager.go) stores prompts and responses, which could lead to data leakage if not properly secured. Additionally, the use of external AI models (even if local) introduces a new attack surface that needs to be carefully managed to prevent potential exploits or data breaches.
    โšก Recommended focus areas for review

    Potential Security Risk
    The LLM integration might expose sensitive information or allow command execution if not properly secured. Careful review of input sanitization and output handling is needed. Error Handling
    The error handling in the Complete function might be insufficient. It's returning an empty string on error, which could lead to silent failures. Input Validation
    The Respond function doesn't validate the resType input. This could lead to unexpected behavior if an invalid type is provided.
    github-actions[bot] commented 3 weeks ago

    Failed to generate code suggestions for PR

    codiumai-pr-agent-pro[bot] commented 3 weeks ago

    PR Code Suggestions โœจ

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Score
    Security
    Use environment variables for sensitive information ___ **Consider using environment variables or a secure secret management system for the
    API key instead of hardcoding it in the configuration file. This improves security
    by keeping sensitive information out of the codebase.** [config/backend-config.yaml [91]](https://github.com/mrheinen/lophiid/pull/53/files#diff-422f5c2c60abdac2feda6fca8b0816bb571e902de5d73a2c5c9c430406dff9c5R91-R91) ```diff -api_key: AAAAAAAAAAAAAAAAAA +api_key: ${LLM_API_KEY} ``` - [ ] **Apply this suggestion**
    Suggestion importance[1-10]: 9 Why: This suggestion significantly improves security by preventing hardcoded API keys in the configuration file, which is a critical best practice for handling sensitive information.
    9
    โœ… Add a security warning for the AI integration feature ___ **Consider adding a warning about the potential security implications of using
    AI-generated responses in a honeypot system. This can help users understand the
    risks and make informed decisions about enabling this feature.** [README.md [124]](https://github.com/mrheinen/lophiid/pull/53/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5R124-R124) ```diff -NOTE: this is a very experimental feature. +NOTE: This is a very experimental feature. Be aware that using AI-generated responses in a honeypot system may have security implications. Use with caution and ensure you understand the risks before enabling this feature in a production environment. ``` `[Suggestion has been applied]`
    Suggestion importance[1-10]: 8 Why: This suggestion improves user awareness of potential security risks associated with the experimental AI feature, which is crucial for responsible use of the system.
    8
    Possible issue
    Add error handling for nil llmResponder to prevent potential nil pointer dereferences ___ **Consider adding error handling for the case when s.llmResponder is nil in the
    getResponderData method. This would prevent potential nil pointer dereferences and
    make the code more robust.** [pkg/backend/backend.go [611]](https://github.com/mrheinen/lophiid/pull/53/files#diff-c65bcfe9bb457434c3e69ba3f0576d7669935f350d24e2c2c58b05b4f9c510b2R611-R611) ```diff -if err == nil && reg != nil && s.llmResponder != nil { +if err == nil && reg != nil { + if s.llmResponder == nil { + slog.Error("llmResponder is nil") + return strings.Replace(string(content.Data), responder.LLMReplacementTag, responder.LLMReplacementFallbackString, 1) + } ``` - [ ] **Apply this suggestion**
    Suggestion importance[1-10]: 8 Why: This suggestion addresses a potential runtime error by adding a nil check for llmResponder. It improves the robustness of the code and prevents possible crashes.
    8
    Add a nil check for the responder parameter to prevent potential issues ___ **Consider adding a check for nil responder in the NewGojaJavascriptRunner function.
    This would prevent potential issues if a nil responder is passed to the function.** [pkg/javascript/goja.go [59]](https://github.com/mrheinen/lophiid/pull/53/files#diff-93ebbcb2bc5cc6ca7ef6d25c1ce739d50fc1194101ed2b4469c1941da4111828R59-R59) ```diff func NewGojaJavascriptRunner(dbClient database.DatabaseClient, allowedCommands []string, commandTimeout time.Duration, responder responder.Responder, metrics *GojaMetrics) *GojaJavascriptRunner { + if responder == nil { + slog.Error("nil responder passed to NewGojaJavascriptRunner") + return nil + } ``` - [ ] **Apply this suggestion**
    Suggestion importance[1-10]: 7 Why: This suggestion adds a defensive check to prevent potential nil pointer dereferences. It improves the robustness of the code and provides better error handling.
    7
    Best practice
    Add a timeout to the context for the API call to prevent potential hangs ___ **Consider adding a timeout to the context passed to the CreateChatCompletion method
    in the Complete function. This would ensure that the API call doesn't hang
    indefinitely if there's a network issue or if the OpenAI service is unresponsive.** [pkg/llm/llm.go [126-128]](https://github.com/mrheinen/lophiid/pull/53/files#diff-e005b1818731f04d8ccacb709416325ca76fe22743ff53ef198845c428640cc6R126-R128) ```diff +ctx, cancel := context.WithTimeout(ctx, 30*time.Second) +defer cancel() resp, err := l.client.CreateChatCompletion( ctx, openai.ChatCompletionRequest{ ``` - [ ] **Apply this suggestion**
    Suggestion importance[1-10]: 8 Why: Adding a timeout to the API call context is a best practice that prevents the application from hanging indefinitely. This improves the overall reliability of the system.
    8
    Reliability
    Implement a retry mechanism for LLM API calls to improve resilience ___ **Consider adding a retry mechanism with exponential backoff for the
    l.client.Complete() call. This would improve resilience against temporary network
    issues or API rate limits.** [pkg/llm/llm_manager.go [54-58]](https://github.com/mrheinen/lophiid/pull/53/files#diff-079422eef64450f8a3d97a65b202d2a89a3850d6dd9282c9a8f930bd3731c26aR54-R58) ```diff -retStr, err := l.client.Complete(ctx, prompt) - +var retStr string +var err error +for attempts := 0; attempts < 3; attempts++ { + retStr, err = l.client.Complete(ctx, prompt) + if err == nil { + break + } + slog.Warn("Error completing prompt, retrying", slog.String("prompt", prompt), slog.String("error", err.Error()), slog.Int("attempt", attempts+1)) + time.Sleep(time.Duration(attempts+1) * time.Second) +} if err != nil { - slog.Error("Error completing prompt", slog.String("prompt", prompt), slog.String("error", err.Error())) + slog.Error("Error completing prompt after retries", slog.String("prompt", prompt), slog.String("error", err.Error())) return "", fmt.Errorf("error completing prompt: %w", err) } ``` - [ ] **Apply this suggestion**
    Suggestion importance[1-10]: 8 Why: Adding a retry mechanism with exponential backoff significantly improves the reliability of LLM API calls. This is particularly valuable for handling temporary network issues or API rate limits, enhancing overall system resilience.
    8
    Maintainability
    Refactor repeated BackendServer creation into a helper function to improve test maintainability ___ **Consider using a helper function to create the BackendServer instance in each test
    case. This would reduce code duplication and make the tests more maintainable. You
    could create a function like createTestBackendServer(fakeRes
    *responder.FakeResponder) *BackendServer that encapsulates the common setup logic.** [pkg/backend/backend_test.go [201-202]](https://github.com/mrheinen/lophiid/pull/53/files#diff-841290eccbd676c43597489825cc8f1628359283c634321e496e76e31f6fed9cR201-R202) ```diff fakeRes := &responder.FakeResponder{} -b := NewBackendServer(fdbc, bMetrics, &fakeJrunner, alertManager, &vt.FakeVTManager{}, &whoisManager, &queryRunner, &fakeLimiter, &fIpMgr, fakeRes, GetDefaultBackendConfig()) +b := createTestBackendServer(fakeRes) ``` - [ ] **Apply this suggestion**
    Suggestion importance[1-10]: 7 Why: This suggestion improves code maintainability by reducing duplication across multiple test cases. It makes the tests more readable and easier to update in the future.
    7
    Refactor LLM configuration into a dedicated struct for improved maintainability ___ **Consider using a configuration struct to encapsulate the LLM-related settings
    instead of passing them individually to NewLLMManager. This would make the code more
    maintainable and easier to extend in the future.** [cmd/backend/backend_main.go [187-191]](https://github.com/mrheinen/lophiid/pull/53/files#diff-aea9d0ad7326cd6083f662c484917c42ab9ff66183cbe981873c671cedc76b7cR187-R191) ```diff -llmClient := llm.NewOpenAILLMClient(cfg.Responder.ApiKey, cfg.Responder.ApiLocation, "") -pCache := util.NewStringMapCache[string]("LLM prompt cache", cfg.Responder.CacheExpirationTime) -llmMetrics := llm.CreateLLMMetrics(metricsRegistry) +llmConfig := llm.Config{ + ApiKey: cfg.Responder.ApiKey, + ApiLocation: cfg.Responder.ApiLocation, + CacheExpirationTime: cfg.Responder.CacheExpirationTime, + CompletionTimeout: cfg.Responder.LLMCompletionTimeout, +} +llmManager := llm.NewLLMManager(llmConfig, metricsRegistry) -llmManager := llm.NewLLMManager(llmClient, pCache, llmMetrics, cfg.Responder.LLMCompletionTimeout) - ``` - [ ] **Apply this suggestion**
    Suggestion importance[1-10]: 7 Why: This suggestion improves code organization and maintainability by encapsulating LLM-related settings into a single configuration struct. It makes the code more modular and easier to extend in the future.
    7
    Input validation
    Add input validation for the responder type to improve function robustness ___ **Consider adding input validation for the resType parameter in the Respond method.
    This would prevent potential issues with unsupported responder types and improve the
    overall robustness of the function.** [pkg/backend/responder/llm_responder.go [19-23]](https://github.com/mrheinen/lophiid/pull/53/files#diff-c3e93e5adb1e4d4cdb55ce0b82481329eb36fe0db2b23559380252ff22c5969dR19-R23) ```diff func (l *LLMResponder) Respond(resType string, promptInput string, template string) (string, error) { + supportedTypes := map[string]bool{ + constants.ResponderTypeCommandInjection: true, + constants.ResponderTypeSourceCodeExecution: true, + } + if !supportedTypes[resType] { + return "", fmt.Errorf("unsupported responder type: %s", resType) + } + // First make sure there actually is a replacement tag. If the tag is missing // then we will append one to the end of the template. if !strings.Contains(template, LLMReplacementTag) { template = fmt.Sprintf("%s\n%s", template, LLMReplacementTag) } ``` - [ ] **Apply this suggestion**
    Suggestion importance[1-10]: 7 Why: This suggestion adds important input validation for the responder type, preventing potential issues with unsupported types. It improves the function's robustness and helps catch errors early, enhancing overall code quality.
    7
    Enhancement
    Add a configuration option to limit the size of LLM responses ___ **Consider adding a max_tokens or max_response_length configuration option to limit
    the size of the LLM responses. This can help prevent excessive resource usage and
    ensure consistent response times.** [config/backend-config.yaml [94-95]](https://github.com/mrheinen/lophiid/pull/53/files#diff-422f5c2c60abdac2feda6fca8b0816bb571e902de5d73a2c5c9c430406dff9c5R94-R95) ```diff # How long a completion is allowed to take. llm_completion_timeout: 60s +# Maximum number of tokens in the LLM response +max_response_tokens: 1000 ``` - [ ] **Apply this suggestion**
    Suggestion importance[1-10]: 7 Why: This suggestion enhances resource management and performance by allowing control over LLM response sizes, which is important for maintaining system stability and consistency.
    7
    Add a logging library dependency for improved error handling and debugging ___ **Consider adding a dependency on a logging library in the go_library rule. This will
    allow for better error handling and debugging capabilities in the LLM integration
    code.** [pkg/llm/BUILD.bazel [12-17]](https://github.com/mrheinen/lophiid/pull/53/files#diff-756699b74c76793b70cc9c9b4c8735377aefafc2a1a433b4fed40c4f0caa2dc1R12-R17) ```diff deps = [ "//pkg/metrics", "//pkg/util", "@com_github_prometheus_client_golang//prometheus", "@com_github_sashabaranov_go_openai//:go-openai", + "@org_uber_go_zap//:zap", ], ``` - [ ] **Apply this suggestion**
    Suggestion importance[1-10]: 6 Why: This suggestion enhances the development and maintenance process by improving error handling and debugging capabilities, which is valuable for complex features like LLM integration.
    6
    Error handling
    Add a check for empty responses from the LLM to improve error handling ___ **Consider adding error handling for the llmResponder.Respond() call. If an error
    occurs, it's currently not being handled, which could lead to unexpected behavior or
    silent failures.** [cmd/llm/main.go [51-55]](https://github.com/mrheinen/lophiid/pull/53/files#diff-733501c64048a9c28435343cfa138c8892120de440b9febe8d0d24d320a5690bR51-R55) ```diff res, err := llmResponder.Respond(*responderType, *query, responder.LLMReplacementTag) if err != nil { fmt.Printf("Error: %v\n", err) return } +if res == "" { + fmt.Println("Warning: Empty response received from LLM") +} ``` - [ ] **Apply this suggestion**
    Suggestion importance[1-10]: 6 Why: This suggestion enhances error handling by checking for empty responses from the LLM. It adds an extra layer of robustness to the code, helping to identify potential issues with LLM responses.
    6

    ๐Ÿ’ก Need additional feedback ? start a PR chat