AI integration - Githubissues

mrheinen commented 3 weeks ago

User description

Add support for LLM integration in both rules and content scripts.

PR Type

Enhancement

Description

Integrated LLM (Language Model) functionality into the backend server for enhanced response generation.
Added a new LLM command-line utility for testing and debugging LLM responses.
Implemented LLM responder with support for command injection and source code execution scenarios.
Updated backend configuration to include LLM-related settings.
Refactored extractors to use a separate decoding package for string manipulation functions.
Added validation for responder regex in the API server.
Updated various files to accommodate the new LLM integration, including import path changes and struct updates.
Created new files for LLM prompts, responder interface, and LLM responder implementation.

Changes walkthrough 📝

Relevant files

Configuration changes

2 files

api_server.go `Update API server initialization` cmd/api/api_server.go Updated NewGojaJavascriptRunner call to include a nil responder parameter	+1/-1
config.go `Add LLM responder configuration` pkg/backend/config.go - Added Responder configuration struct with LLM-related settings	+8/-0

Enhancement

7 files

backend_main.go `Integrate LLM responder in backend` cmd/backend/backend_main.go Added LLM responder initialization and configuration Updated NewGojaJavascriptRunner and NewBackendServer calls to include LLM responder	+17/-2
main.go `Add LLM command-line utility` cmd/llm/main.go Added new file for LLM command-line utility Implements functionality to test LLM responder with different configurations	+58/-0
backend.go `Integrate LLM responder in backend server` pkg/backend/backend.go Added LLM responder to BackendServer struct Updated NewBackendServer to include LLM responder Implemented getResponderData method for LLM integration	+50/-2
extractors.go `Remove StringsFromRequest and related functions` pkg/backend/extractors/extractors.go Removed StringsFromRequest and related functions Removed unused imports	+0/-148
llm_prompts.go `Add LLM prompt templates` pkg/backend/responder/llm_prompts.go Added new file with LLM prompt templates for command injection and source code execution	+36/-0
llm_responder.go `Implement LLM responder` pkg/backend/responder/llm_responder.go - Implemented LLMResponder struct and methods for LLM integration	+45/-0
responder.go `Add Responder interface and constants` pkg/backend/responder/responder.go Added Responder interface and FakeResponder for testing Defined constants for LLM replacement tags	+19/-0

Error handling

1 files

server.go

Add responder regex validation

pkg/api/server.go

Added validation for responder regex in HandleUpsertSingleContentRule

+9/-0

Miscellaneous

5 files

base64_extractor.go `Update import for StringsFromRequest` pkg/backend/extractors/base64_extractor.go - Updated import path for StringsFromRequest function	+2/-1
nc_extractor.go `Update import for StringsFromRequest` pkg/backend/extractors/nc_extractor.go - Updated import path for StringsFromRequest function	+2/-2
tcp_extractor.go `Update import for StringsFromRequest` pkg/backend/extractors/tcp_extractor.go - Updated import path for StringsFromRequest function	+2/-1
unicode_extractor.go `Update import for StringsFromRequest` pkg/backend/extractors/unicode_extractor.go - Updated import path for StringsFromRequest function	+2/-1
url_extractor.go `Update import for StringsFromRequest` pkg/backend/extractors/url_extractor.go - Updated import path for StringsFromRequest function	+2/-1

💡 PR-Agent usage: Comment /help "your question" on any pull request to receive relevant information

codiumai-pr-agent-pro[bot] commented 3 weeks ago

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪

🧪 No relevant tests

🔒 Security concerns

Sensitive information exposure:
The LLM integration could potentially expose sensitive information if the prompts or responses contain any confidential data. The caching mechanism (pkg/llm/llm_manager.go) stores prompts and responses, which could lead to data leakage if not properly secured. Additionally, the use of external AI models (even if local) introduces a new attack surface that needs to be carefully managed to prevent potential exploits or data breaches.

⚡ Recommended focus areas for review

Potential Security Risk
The LLM integration might expose sensitive information or allow command execution if not properly secured. Careful review of input sanitization and output handling is needed. Error Handling
The error handling in the Complete function might be insufficient. It's returning an empty string on error, which could lead to silent failures. Input Validation
The Respond function doesn't validate the resType input. This could lead to unexpected behavior if an invalid type is provided.

github-actions[bot] commented 3 weeks ago

Failed to generate code suggestions for PR

codiumai-pr-agent-pro[bot] commented 3 weeks ago

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Score
Security	Use environment variables for sensitive information ___ Consider using environment variables or a secure secret management system for the API key instead of hardcoding it in the configuration file. This improves security by keeping sensitive information out of the codebase. [config/backend-config.yaml [91]](https://github.com/mrheinen/lophiid/pull/53/files#diff-422f5c2c60abdac2feda6fca8b0816bb571e902de5d73a2c5c9c430406dff9c5R91-R91) ```diff -api_key: AAAAAAAAAAAAAAAAAA +api_key: ${LLM_API_KEY} ``` - [ ] Apply this suggestion Suggestion importance[1-10]: 9 Why: This suggestion significantly improves security by preventing hardcoded API keys in the configuration file, which is a critical best practice for handling sensitive information.	9
Security	✅ Add a security warning for the AI integration feature ___ Consider adding a warning about the potential security implications of using AI-generated responses in a honeypot system. This can help users understand the risks and make informed decisions about enabling this feature. [README.md [124]](https://github.com/mrheinen/lophiid/pull/53/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5R124-R124) ```diff -NOTE: this is a very experimental feature. +NOTE: This is a very experimental feature. Be aware that using AI-generated responses in a honeypot system may have security implications. Use with caution and ensure you understand the risks before enabling this feature in a production environment. ``` `[Suggestion has been applied]` Suggestion importance[1-10]: 8 Why: This suggestion improves user awareness of potential security risks associated with the experimental AI feature, which is crucial for responsible use of the system.	8
Possible issue	Add error handling for nil llmResponder to prevent potential nil pointer dereferences ___ Consider adding error handling for the case when `s.llmResponder` is nil in the `getResponderData` method. This would prevent potential nil pointer dereferences and make the code more robust. [pkg/backend/backend.go [611]](https://github.com/mrheinen/lophiid/pull/53/files#diff-c65bcfe9bb457434c3e69ba3f0576d7669935f350d24e2c2c58b05b4f9c510b2R611-R611) ```diff -if err == nil && reg != nil && s.llmResponder != nil { +if err == nil && reg != nil { + if s.llmResponder == nil { + slog.Error("llmResponder is nil") + return strings.Replace(string(content.Data), responder.LLMReplacementTag, responder.LLMReplacementFallbackString, 1) + } ``` - [ ] Apply this suggestion Suggestion importance[1-10]: 8 Why: This suggestion addresses a potential runtime error by adding a nil check for llmResponder. It improves the robustness of the code and prevents possible crashes.	8
Possible issue	Add a nil check for the responder parameter to prevent potential issues ___ Consider adding a check for nil `responder` in the `NewGojaJavascriptRunner` function. This would prevent potential issues if a nil responder is passed to the function. [pkg/javascript/goja.go [59]](https://github.com/mrheinen/lophiid/pull/53/files#diff-93ebbcb2bc5cc6ca7ef6d25c1ce739d50fc1194101ed2b4469c1941da4111828R59-R59) ```diff func NewGojaJavascriptRunner(dbClient database.DatabaseClient, allowedCommands []string, commandTimeout time.Duration, responder responder.Responder, metrics GojaMetrics) GojaJavascriptRunner { + if responder == nil { + slog.Error("nil responder passed to NewGojaJavascriptRunner") + return nil + } ``` - [ ] Apply this suggestion Suggestion importance[1-10]: 7 Why: This suggestion adds a defensive check to prevent potential nil pointer dereferences. It improves the robustness of the code and provides better error handling.	7
Best practice	Add a timeout to the context for the API call to prevent potential hangs ___ Consider adding a timeout to the context passed to the `CreateChatCompletion` method in the `Complete` function. This would ensure that the API call doesn't hang indefinitely if there's a network issue or if the OpenAI service is unresponsive. [pkg/llm/llm.go [126-128]](https://github.com/mrheinen/lophiid/pull/53/files#diff-e005b1818731f04d8ccacb709416325ca76fe22743ff53ef198845c428640cc6R126-R128) ```diff +ctx, cancel := context.WithTimeout(ctx, 30time.Second) +defer cancel() resp, err := l.client.CreateChatCompletion( ctx, openai.ChatCompletionRequest{ ``` - [ ] Apply this suggestion* Suggestion importance[1-10]: 8 Why: Adding a timeout to the API call context is a best practice that prevents the application from hanging indefinitely. This improves the overall reliability of the system.	8
Reliability	Implement a retry mechanism for LLM API calls to improve resilience ___ Consider adding a retry mechanism with exponential backoff for the `l.client.Complete()` call. This would improve resilience against temporary network issues or API rate limits. [pkg/llm/llm_manager.go [54-58]](https://github.com/mrheinen/lophiid/pull/53/files#diff-079422eef64450f8a3d97a65b202d2a89a3850d6dd9282c9a8f930bd3731c26aR54-R58) ```diff -retStr, err := l.client.Complete(ctx, prompt) - +var retStr string +var err error +for attempts := 0; attempts < 3; attempts++ { + retStr, err = l.client.Complete(ctx, prompt) + if err == nil { + break + } + slog.Warn("Error completing prompt, retrying", slog.String("prompt", prompt), slog.String("error", err.Error()), slog.Int("attempt", attempts+1)) + time.Sleep(time.Duration(attempts+1) * time.Second) +} if err != nil { - slog.Error("Error completing prompt", slog.String("prompt", prompt), slog.String("error", err.Error())) + slog.Error("Error completing prompt after retries", slog.String("prompt", prompt), slog.String("error", err.Error())) return "", fmt.Errorf("error completing prompt: %w", err) } ``` - [ ] Apply this suggestion Suggestion importance[1-10]: 8 Why: Adding a retry mechanism with exponential backoff significantly improves the reliability of LLM API calls. This is particularly valuable for handling temporary network issues or API rate limits, enhancing overall system resilience.	8
Maintainability	Refactor repeated BackendServer creation into a helper function to improve test maintainability ___ *Consider using a helper function to create the BackendServer instance in each test case. This would reduce code duplication and make the tests more maintainable. You could create a function like `createTestBackendServer(fakeRes` `responder.FakeResponder) BackendServer` that encapsulates the common setup logic.* [pkg/backend/backend_test.go [201-202]](https://github.com/mrheinen/lophiid/pull/53/files#diff-841290eccbd676c43597489825cc8f1628359283c634321e496e76e31f6fed9cR201-R202) ```diff fakeRes := &responder.FakeResponder{} -b := NewBackendServer(fdbc, bMetrics, &fakeJrunner, alertManager, &vt.FakeVTManager{}, &whoisManager, &queryRunner, &fakeLimiter, &fIpMgr, fakeRes, GetDefaultBackendConfig()) +b := createTestBackendServer(fakeRes) ``` - [ ] Apply this suggestion Suggestion importance[1-10]: 7 Why: This suggestion improves code maintainability by reducing duplication across multiple test cases. It makes the tests more readable and easier to update in the future.	7
Maintainability	Refactor LLM configuration into a dedicated struct for improved maintainability ___ Consider using a configuration struct to encapsulate the LLM-related settings instead of passing them individually to NewLLMManager. This would make the code more maintainable and easier to extend in the future. [cmd/backend/backend_main.go [187-191]](https://github.com/mrheinen/lophiid/pull/53/files#diff-aea9d0ad7326cd6083f662c484917c42ab9ff66183cbe981873c671cedc76b7cR187-R191) ```diff -llmClient := llm.NewOpenAILLMClient(cfg.Responder.ApiKey, cfg.Responder.ApiLocation, "") -pCache := util.NewStringMapCache[string]("LLM prompt cache", cfg.Responder.CacheExpirationTime) -llmMetrics := llm.CreateLLMMetrics(metricsRegistry) +llmConfig := llm.Config{ + ApiKey: cfg.Responder.ApiKey, + ApiLocation: cfg.Responder.ApiLocation, + CacheExpirationTime: cfg.Responder.CacheExpirationTime, + CompletionTimeout: cfg.Responder.LLMCompletionTimeout, +} +llmManager := llm.NewLLMManager(llmConfig, metricsRegistry) -llmManager := llm.NewLLMManager(llmClient, pCache, llmMetrics, cfg.Responder.LLMCompletionTimeout) - ``` - [ ] Apply this suggestion Suggestion importance[1-10]: 7 Why: This suggestion improves code organization and maintainability by encapsulating LLM-related settings into a single configuration struct. It makes the code more modular and easier to extend in the future.	7
Input validation	Add input validation for the responder type to improve function robustness ___ Consider adding input validation for the `resType` parameter in the `Respond` method. This would prevent potential issues with unsupported responder types and improve the overall robustness of the function. [pkg/backend/responder/llm_responder.go [19-23]](https://github.com/mrheinen/lophiid/pull/53/files#diff-c3e93e5adb1e4d4cdb55ce0b82481329eb36fe0db2b23559380252ff22c5969dR19-R23) ```diff func (l LLMResponder) Respond(resType string, promptInput string, template string) (string, error) { + supportedTypes := map[string]bool{ + constants.ResponderTypeCommandInjection: true, + constants.ResponderTypeSourceCodeExecution: true, + } + if !supportedTypes[resType] { + return "", fmt.Errorf("unsupported responder type: %s", resType) + } + // First make sure there actually is a replacement tag. If the tag is missing // then we will append one to the end of the template. if !strings.Contains(template, LLMReplacementTag) { template = fmt.Sprintf("%s\n%s", template, LLMReplacementTag) } ``` - [ ] Apply this suggestion* Suggestion importance[1-10]: 7 Why: This suggestion adds important input validation for the responder type, preventing potential issues with unsupported types. It improves the function's robustness and helps catch errors early, enhancing overall code quality.	7
Enhancement	Add a configuration option to limit the size of LLM responses ___ Consider adding a `max_tokens` or `max_response_length` configuration option to limit the size of the LLM responses. This can help prevent excessive resource usage and ensure consistent response times. [config/backend-config.yaml [94-95]](https://github.com/mrheinen/lophiid/pull/53/files#diff-422f5c2c60abdac2feda6fca8b0816bb571e902de5d73a2c5c9c430406dff9c5R94-R95) ```diff # How long a completion is allowed to take. llm_completion_timeout: 60s +# Maximum number of tokens in the LLM response +max_response_tokens: 1000 ``` - [ ] Apply this suggestion Suggestion importance[1-10]: 7 Why: This suggestion enhances resource management and performance by allowing control over LLM response sizes, which is important for maintaining system stability and consistency.	7
Enhancement	Add a logging library dependency for improved error handling and debugging ___ Consider adding a dependency on a logging library in the `go_library` rule. This will allow for better error handling and debugging capabilities in the LLM integration code. [pkg/llm/BUILD.bazel [12-17]](https://github.com/mrheinen/lophiid/pull/53/files#diff-756699b74c76793b70cc9c9b4c8735377aefafc2a1a433b4fed40c4f0caa2dc1R12-R17) ```diff deps = [ "//pkg/metrics", "//pkg/util", "@com_github_prometheus_client_golang//prometheus", "@com_github_sashabaranov_go_openai//:go-openai", + "@org_uber_go_zap//:zap", ], ``` - [ ] Apply this suggestion Suggestion importance[1-10]: 6 Why: This suggestion enhances the development and maintenance process by improving error handling and debugging capabilities, which is valuable for complex features like LLM integration.	6
Error handling	Add a check for empty responses from the LLM to improve error handling ___ Consider adding error handling for the `llmResponder.Respond()` call. If an error occurs, it's currently not being handled, which could lead to unexpected behavior or silent failures. [cmd/llm/main.go [51-55]](https://github.com/mrheinen/lophiid/pull/53/files#diff-733501c64048a9c28435343cfa138c8892120de440b9febe8d0d24d320a5690bR51-R55) ```diff res, err := llmResponder.Respond(responderType, query, responder.LLMReplacementTag) if err != nil { fmt.Printf("Error: %v\n", err) return } +if res == "" { + fmt.Println("Warning: Empty response received from LLM") +} ``` - [ ] Apply this suggestion Suggestion importance[1-10]: 6 Why: This suggestion enhances error handling by checking for empty responses from the LLM. It adds an extra layer of robustness to the code, helping to identify potential issues with LLM responses.	6

💡 Need additional feedback ? start a PR chat

mrheinen / lophiid

AI integration #53