mrheinen / lophiid

A distributed honeypot for monitoring large scale web attacks
GNU General Public License v2.0
6 stars 1 forks source link

Add initial version of LLM integration #52

Closed mrheinen closed 3 weeks ago

mrheinen commented 3 weeks ago

PR Type

Enhancement, Documentation


Description

This PR introduces AI/LLM integration to Lophiid, enhancing its ability to generate more realistic and dynamic responses to attacks. Key changes include:

This enhancement aims to improve Lophiid's ability to interact with attackers by providing more convincing and context-aware responses, particularly for command injection and source code execution scenarios.


Changes walkthrough ๐Ÿ“

Relevant files
Enhancement
23 files
api_server.go
Update JavaScript runner initialization                                   

cmd/api/api_server.go
  • Added a new parameter nil to the NewGojaJavascriptRunner function call

  • +1/-1     
    backend_main.go
    Integrate LLM responder in backend                                             

    cmd/backend/backend_main.go
  • Added LLM responder configuration and initialization
  • Updated backend server initialization to include LLM responder
  • +17/-2   
    main.go
    Add LLM testing utility                                                                   

    cmd/llm/main.go
  • Added a new command-line tool for testing LLM responders
  • Implements flag parsing and LLM responder initialization
  • +58/-0   
    server.go
    Add responder regex validation                                                     

    pkg/api/server.go - Added validation for responder regex in content rule updates
    +9/-0     
    backend.go
    Integrate LLM responder in backend server                               

    pkg/backend/backend.go
  • Added LLM responder to BackendServer struct
  • Implemented getResponderData method for LLM integration
  • Updated HandleProbe method to use LLM responder
  • +50/-2   
    base64_extractor.go
    Update import for decoding utilities                                         

    pkg/backend/extractors/base64_extractor.go - Updated import path for decoding utilities
    +2/-1     
    extractors.go
    Move decoding functions to separate package                           

    pkg/backend/extractors/extractors.go - Removed decoding functions, moving them to a separate package
    +0/-148 
    nc_extractor.go
    Update import for decoding utilities                                         

    pkg/backend/extractors/nc_extractor.go - Updated import path for decoding utilities
    +2/-2     
    tcp_extractor.go
    Update import for decoding utilities                                         

    pkg/backend/extractors/tcp_extractor.go - Updated import path for decoding utilities
    +2/-1     
    unicode_extractor.go
    Update import for decoding utilities                                         

    pkg/backend/extractors/unicode_extractor.go - Updated import path for decoding utilities
    +2/-1     
    url_extractor.go
    Update import for decoding utilities                                         

    pkg/backend/extractors/url_extractor.go - Updated import path for decoding utilities
    +2/-1     
    llm_prompts.go
    Add LLM prompts for responders                                                     

    pkg/backend/responder/llm_prompts.go
  • Added predefined prompts for command injection and source code
    execution
  • +36/-0   
    llm_responder.go
    Implement LLM responder                                                                   

    pkg/backend/responder/llm_responder.go - Implemented LLMResponder struct and methods for LLM integration
    +45/-0   
    responder.go
    Define responder interface and constants                                 

    pkg/backend/responder/responder.go
  • Defined Responder interface and constants for LLM integration
  • Added FakeResponder for testing purposes
  • +19/-0   
    database.go
    Update ContentRule struct for LLM responder                           

    pkg/database/database.go
  • Added new fields to ContentRule struct for LLM responder configuration

  • +4/-1     
    encoding.go
    Add URI and HTML decoding methods                                               

    pkg/javascript/encoding.go - Added URI and HTML decoding methods to the Encoding struct
    +19/-0   
    goja.go
    Integrate LLM responder in JavaScript runner                         

    pkg/javascript/goja.go
  • Added Responder to Util struct
  • Updated GojaJavascriptRunner to include LLM responder
  • +15/-8   
    wrappers.go
    Add responder wrapper for JavaScript                                         

    pkg/javascript/wrappers.go
  • Added ResponderWrapper struct and Respond method for LLM integration
  • +20/-0   
    shared_constants.go
    Add responder and decoder type constants                                 

    pkg/util/constants/shared_constants.go - Added constants for responder types and decoder types
    +14/-1   
    decoding.go
    Move and enhance decoding utilities                                           

    pkg/util/decoding/decoding.go
  • Moved decoding functions from extractors package
  • Implemented StringsFromRequest, DecodeURL, and other decoding
    utilities
  • +166/-0 
    general.go
    Add random string generation utility                                         

    pkg/util/general.go - Added GenerateRandomString function for creating random strings
    +11/-0   
    RuleForm.vue
    Update rule form with responder fields                                     

    ui/src/components/container/RuleForm.vue
  • Added new form fields for responder, responder regex, and responder
    decoder
  • +38/-0   
    database.sql
    Update database schema for responder support                         

    config/database.sql
  • Added new ENUM types for responder and decoder types
  • Updated content_rule table with new responder-related columns
  • +6/-0     
    Configuration changes
    11 files
    config.go
    Add LLM responder configuration                                                   

    pkg/backend/config.go - Added Responder configuration struct with LLM-related settings
    +8/-0     
    Config.js
    Add responder configuration options                                           

    ui/src/Config.js - Added ruleResponderTypes and ruleResponderDecoders arrays
    +2/-0     
    BUILD.bazel
    Update Bazel build for backend                                                     

    cmd/backend/BUILD.bazel - Added new dependencies for responder and LLM packages
    +2/-0     
    BUILD.bazel
    Add Bazel build for LLM tool                                                         

    cmd/llm/BUILD.bazel - Added new Bazel build file for LLM command-line tool
    +20/-0   
    backend-config.yaml
    Add LLM responder configuration                                                   

    config/backend-config.yaml - Added responder configuration section with LLM-related settings
    +12/-0   
    BUILD.bazel
    Update Bazel build for backend package                                     

    pkg/backend/BUILD.bazel - Added new dependencies for responder and decoding packages
    +3/-0     
    BUILD.bazel
    Update Bazel build for extractors                                               

    pkg/backend/extractors/BUILD.bazel - Updated dependencies and test files
    +1/-1     
    BUILD.bazel
    Add Bazel build for responder package                                       

    pkg/backend/responder/BUILD.bazel - Added new Bazel build file for responder package
    +17/-0   
    BUILD.bazel
    Update Bazel build for JavaScript package                               

    pkg/javascript/BUILD.bazel - Added new dependencies for responder and decoding packages
    +3/-0     
    BUILD.bazel
    Add Bazel build for LLM package                                                   

    pkg/llm/BUILD.bazel - Added new Bazel build file for LLM package
    +28/-0   
    BUILD.bazel
    Add Bazel build for decoding package                                         

    pkg/util/decoding/BUILD.bazel - Added new Bazel build file for decoding package
    +15/-0   
    Miscellaneous
    1 files
    metrics.go
    Remove comment                                                                                     

    pkg/backend/metrics.go - Removed unnecessary comment
    +0/-1     
    Tests
    2 files
    goja_test.go
    Update tests for LLM responder integration                             

    pkg/javascript/goja_test.go
  • Updated tests to include LLM responder in JavaScript runner
    initialization
  • +13/-6   
    decoding_test.go
    Update decoding tests                                                                       

    pkg/util/decoding/decoding_test.go - Updated package name and imports for decoding tests
    +3/-4     
    Documentation
    4 files
    AI.md
    Add AI integration documentation                                                 

    AI.md
  • Added new documentation file explaining AI integration in Lophiid
  • Covers introduction, enabling AI, responders concept, and
    implementation details
  • +78/-0   
    README.md
    Update README with AI integration information                       

    README.md
  • Added section on AI / LLM integration
  • Briefly explains the new feature and links to AI.md for more details
  • +16/-0   
    SCRIPTING.md
    Update scripting documentation with new features                 

    SCRIPTING.md
  • Added documentation for new encoding methods: uri.decode and
    html.decode
  • Added documentation for LLM responder access in scripts
  • +45/-0   
    SEARCH_KEYWORDS.md
    Add new search keywords for responder fields                         

    SEARCH_KEYWORDS.md
  • Added new search keywords for responder, responder_decoder, and
    responder_regex
  • +3/-0     
    Dependencies
    3 files
    deps.bzl
    Add go-openai dependency                                                                 

    deps.bzl - Added go-openai dependency
    +7/-0     
    go.mod
    Add go-openai dependency                                                                 

    go.mod - Added go-openai dependency
    +1/-0     
    go.sum
    Update dependency checksums                                                           

    go.sum - Added go-openai dependency checksums
    +2/-0     

    ๐Ÿ’ก PR-Agent usage: Comment /help "your question" on any pull request to receive relevant information

    github-actions[bot] commented 3 weeks ago

    Failed to generate code suggestions for PR

    codiumai-pr-agent-pro[bot] commented 3 weeks ago

    PR Reviewer Guide ๐Ÿ”

    Here are some key observations to aid the review process:

    โฑ๏ธ Estimated effort to review: 4 ๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ตโšช
    ๐Ÿงช No relevant tests
    ๐Ÿ”’ Security concerns

    Sensitive information exposure:
    The backend-config.yaml file contains an API key (AAAAAAAAAAAAAAAAAA) which, although likely a placeholder, demonstrates that API keys are stored in plaintext in configuration files. This practice could lead to accidental exposure of sensitive credentials if not properly managed.
    โšก Recommended focus areas for review

    Possible Bug
    The getResponderData method doesn't handle the case where s.llmResponder is nil, which could lead to a nil pointer dereference. Error Handling
    The Complete method doesn't handle context cancellation errors explicitly, which could lead to unclear error messages. Security Concern
    The Respond method doesn't sanitize or validate the promptInput, which could potentially lead to injection attacks if not handled properly elsewhere.
    codiumai-pr-agent-pro[bot] commented 3 weeks ago

    PR Code Suggestions โœจ

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Score
    Possible issue
    Add error handling for the case of no choices in the API response ___ **Consider adding error handling for the case where the API response doesn't contain
    any choices. This will prevent a potential panic if the API returns an unexpected
    response format.** [pkg/llm/llm.go [139-143]](https://github.com/mrheinen/lophiid/pull/52/files#diff-e005b1818731f04d8ccacb709416325ca76fe22743ff53ef198845c428640cc6R139-R143) ```diff if err != nil { return "", fmt.Errorf("ChatCompletion error: %v", err) } +if len(resp.Choices) == 0 { + return "", fmt.Errorf("no choices returned from API") +} + return resp.Choices[0].Message.Content, nil ``` - [ ] **Apply this suggestion**
    Suggestion importance[1-10]: 9 Why: This suggestion addresses a potential panic scenario, significantly improving the robustness and reliability of the API interaction.
    9
    Security
    Use cryptographically secure random number generation for improved security ___ **Consider using a cryptographically secure random number generator instead of
    math/rand for generating random strings. This will improve the security of the
    generated strings, which may be important depending on their usage.** [pkg/util/general.go [60-68]](https://github.com/mrheinen/lophiid/pull/52/files#diff-0664544a48f58f335cd90e8a9bd6e8e5a60d8291eef87c17903fc32de04025eeR60-R68) ```diff +import "crypto/rand" + const printableChars = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~" func GenerateRandomString(length int) string { result := make([]byte, length) for i := 0; i < length; i++ { - result[i] = printableChars[rand.Intn(len(printableChars))] + n, err := rand.Int(rand.Reader, big.NewInt(int64(len(printableChars)))) + if err != nil { + panic(err) + } + result[i] = printableChars[n.Int64()] } return string(result) } ``` - [ ] **Apply this suggestion**
    Suggestion importance[1-10]: 9 Why: Replacing math/rand with crypto/rand for generating random strings is a crucial security improvement. This change significantly enhances the unpredictability of generated strings, which is essential for many security-related applications.
    9
    Use environment variables for sensitive information ___ **Consider using environment variables or a secure secret management system for the
    API key instead of hardcoding it in the configuration file. This improves security
    by keeping sensitive information out of the codebase.** [config/backend-config.yaml [91]](https://github.com/mrheinen/lophiid/pull/52/files#diff-422f5c2c60abdac2feda6fca8b0816bb571e902de5d73a2c5c9c430406dff9c5R91-R91) ```diff -api_key: AAAAAAAAAAAAAAAAAA +api_key: ${LLM_API_KEY} ``` - [ ] **Apply this suggestion**
    Suggestion importance[1-10]: 9 Why: This suggestion significantly improves security by preventing hardcoded API keys in the configuration file. It's a crucial best practice for handling sensitive information.
    9
    Enhancement
    Add context parameter to the Complete method for better request cancellation and timeout handling ___ **Consider adding a context parameter to the Complete method to allow for request
    cancellation and timeout handling at the caller's discretion. This will provide more
    flexibility in managing long-running LLM requests.** [pkg/llm/llm_manager.go [44-51]](https://github.com/mrheinen/lophiid/pull/52/files#diff-079422eef64450f8a3d97a65b202d2a89a3850d6dd9282c9a8f930bd3731c26aR44-R51) ```diff -func (l *LLMManager) Complete(prompt string) (string, error) { +func (l *LLMManager) Complete(ctx context.Context, prompt string) (string, error) { entry, err := l.pCache.Get(prompt) if err == nil { return *entry, nil } - ctx, cancel := context.WithTimeout(context.Background(), l.completionTimeout) + ctx, cancel := context.WithTimeout(ctx, l.completionTimeout) defer cancel() ``` - [ ] **Apply this suggestion**
    Suggestion importance[1-10]: 8 Why: This suggestion significantly improves the flexibility and control over LLM requests, allowing for better resource management and responsiveness. It's a valuable enhancement for handling long-running operations.
    8
    Add a fallback return in the default case of the switch statement for unknown decoder types ___ **Consider using a switch statement instead of multiple if-else conditions for the
    ResponderDecoder types. This would make the code more readable and easier to
    maintain, especially if more decoder types are added in the future.** [pkg/backend/backend.go [620-632]](https://github.com/mrheinen/lophiid/pull/52/files#diff-c65bcfe9bb457434c3e69ba3f0576d7669935f350d24e2c2c58b05b4f9c510b2R620-R632) ```diff switch rule.ResponderDecoder { case constants.ResponderDecoderTypeNone: final_match = match[1] case constants.ResponderDecoderTypeUri: final_match = decoding.DecodeURLOrEmptyString(match[1], true) if final_match == "" { slog.Error("could not decode URI", slog.String("match", match[1])) } case constants.ResponderDecoderTypeHtml: final_match = decoding.DecodeHTML(match[1]) default: slog.Error("unknown responder decoder", slog.String("decoder", rule.ResponderDecoder)) + return strings.Replace(string(content.Data), responder.LLMReplacementTag, responder.LLMReplacementFallbackString, 1) } ``` - [ ] **Apply this suggestion**
    Suggestion importance[1-10]: 7 Why: This suggestion improves error handling by providing a fallback response for unknown decoder types, enhancing the robustness of the code.
    7
    Add a configuration option for maximum token limit ___ **Consider adding a configuration option for the maximum token limit or response
    length for the LLM. This can help control resource usage and prevent excessively
    long responses.** [config/backend-config.yaml [94-95]](https://github.com/mrheinen/lophiid/pull/52/files#diff-422f5c2c60abdac2feda6fca8b0816bb571e902de5d73a2c5c9c430406dff9c5R94-R95) ```diff # How long a completion is allowed to take. llm_completion_timeout: 60s +# Maximum number of tokens in the LLM response. +llm_max_tokens: 1000 ``` - [ ] **Apply this suggestion**
    Suggestion importance[1-10]: 7 Why: This suggestion enhances resource management and control over LLM responses. It's a valuable addition for optimizing performance and preventing potential issues with excessively long responses.
    7
    Improve command-line argument parsing and help message generation ___ **Consider using a more robust flag parsing library like pflag or cobra for better
    command-line argument handling and help message generation. This will make it easier
    to add new flags in the future and provide better user experience.** [cmd/llm/main.go [39-43]](https://github.com/mrheinen/lophiid/pull/52/files#diff-733501c64048a9c28435343cfa138c8892120de440b9febe8d0d24d320a5690bR39-R43) ```diff -flag.Parse() +pflag.Parse() if *apiKey == "" || *query == "" { - fmt.Printf("Usage: %s -api-key -p [-t ]\n", os.Args[0]) - return + pflag.Usage() + os.Exit(1) } ``` - [ ] **Apply this suggestion**
    Suggestion importance[1-10]: 6 Why: The suggestion to use a more robust flag parsing library like `pflag` or `cobra` is valid and can improve the command-line interface. However, the current implementation is functional for a simple tool, so the impact is moderate.
    6
    Add a configuration option for the LLM model name ___ **Add a configuration option for the LLM model name or version. This allows for easier
    switching between different models or versions without changing the code.** [config/backend-config.yaml [88-90]](https://github.com/mrheinen/lophiid/pull/52/files#diff-422f5c2c60abdac2feda6fca8b0816bb571e902de5d73a2c5c9c430406dff9c5R88-R90) ```diff # API location. Note that the implementation was tested with Gemma 2 27b and # is not guaranteed to work with other versions or other LLMs. api_location: http://localhost:8000/v1 +# LLM model name or version +llm_model: "gemma-2-27b" ``` - [ ] **Apply this suggestion**
    Suggestion importance[1-10]: 6 Why: This suggestion improves flexibility by allowing easy switching between different LLM models or versions. It's a useful enhancement for configuration management and future-proofing the system.
    6
    Best practice
    Add a timeout to the context for the API call to prevent potential hanging ___ **Consider adding a timeout to the context passed to the CreateChatCompletion method.
    This will help prevent the API call from hanging indefinitely if there's a network
    issue or if the OpenAI service is slow to respond.** [pkg/llm/llm.go [126-137]](https://github.com/mrheinen/lophiid/pull/52/files#diff-e005b1818731f04d8ccacb709416325ca76fe22743ff53ef198845c428640cc6R126-R137) ```diff +ctx, cancel := context.WithTimeout(ctx, 30*time.Second) +defer cancel() resp, err := l.client.CreateChatCompletion( ctx, openai.ChatCompletionRequest{ Model: l.model, Messages: []openai.ChatCompletionMessage{ { Role: openai.ChatMessageRoleUser, Content: fmt.Sprintf(l.promptTemplate, prompt), }, }, }, ) ``` - [ ] **Apply this suggestion**
    Suggestion importance[1-10]: 8 Why: Adding a timeout is a crucial improvement for API calls, preventing potential indefinite waits and improving the overall reliability of the system.
    8
    Add a warning about risks and ethical considerations ___ **Add a warning about the potential risks and ethical considerations of using
    AI-generated responses in a honeypot system. This helps users understand the
    implications and use the feature responsibly.** [README.md [124]](https://github.com/mrheinen/lophiid/pull/52/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5R124-R124) ```diff -NOTE: this is a very experimental feature. +NOTE: This is a very experimental feature. Be aware of potential risks and ethical considerations when using AI-generated responses in a honeypot system. Use responsibly and in compliance with applicable laws and regulations. ``` - [ ] **Apply this suggestion**
    Suggestion importance[1-10]: 8 Why: This suggestion is important as it raises awareness about potential risks and ethical implications of using AI-generated responses in a honeypot system, which is crucial for responsible use of the feature.
    8
    Error handling
    Improve error handling for invalid responder types ___ **Consider adding error handling for invalid resType values earlier in the function.
    This will prevent unnecessary processing and provide clearer error messages.** [pkg/backend/responder/llm_responder.go [26-35]](https://github.com/mrheinen/lophiid/pull/52/files#diff-c3e93e5adb1e4d4cdb55ce0b82481329eb36fe0db2b23559380252ff22c5969dR26-R35) ```diff var basePrompt string switch resType { case constants.ResponderTypeCommandInjection: basePrompt = commandInjectionPrompt case constants.ResponderTypeSourceCodeExecution: basePrompt = sourceCodeExecutionPrompt - default: return "", fmt.Errorf("invalid responder type: %s", resType) } +if basePrompt == "" { + return "", fmt.Errorf("no prompt defined for responder type: %s", resType) +} + ``` - [ ] **Apply this suggestion**
    Suggestion importance[1-10]: 7 Why: The suggestion enhances error handling by checking for empty prompts, which can prevent potential issues later in the execution. This improvement adds robustness to the code and can help with debugging.
    7

    ๐Ÿ’ก Need additional feedback ? start a PR chat