river-build / river

MIT License
22 stars 5 forks source link

API Keys appearing in logs #1153

Open matcra587 opened 1 month ago

matcra587 commented 1 month ago

Describe the bug We observed periodic Failed to retrieve token balance errors in our logs that expose the RPC endpoint along with our API Key.

To Reproduce Steps to reproduce the issue are unknown as this occurred during normal operation.

Expected behavior Logs should obfuscate the API Key while retaining enough information to diagnose problematic endpoints. A suggested obfuscation format:

Logs Below are relevant log entries that display the issue: (with some data removed to help with readability)

{"time":"2024-09-20T07:19:26.788358756Z","level":"ERROR","msg":"Failed to retrieve token balance","instanceId":"C3XkuHcXsdg8","nodeType":"stream","mode":"full","nodeAddress":"<NODE_ADDRESS>","worker_id":1,"application":"xchain","nodeAddress":"<NODE_ADDRESS>","caller_address":"<CALLER_ADDRESS>","function":"process","req.txid":"<TX_ID>","function":"evaluateErc20Operation","error":"Post \"https://example.domain.tld/0192246b91337ae786d9121ffaf7217f232gfkrl6\": context canceled"}
{"time":"2024-09-20T18:01:18.8789714Z","level":"ERROR","msg":"Failed to retrieve token balance","instanceId":"C3XkuHcXsdg8","nodeType":"stream","mode":"full","nodeAddress":"<NODE_ADDRESS>","worker_id":1,"application":"xchain","nodeAddress":"<NODE_ADDRESS>","caller_address":"<CALLER_ADDRESS>","function":"process","req.txid":"<TX_ID>","function":"evaluateErc20Operation","error":"Post \"https://example.domain.tld/0192246b91337ae786d9121ffaf7217f232gfkrl6\": context canceled"}
{"time":"2024-09-20T19:12:02.661200687Z","level":"ERROR","msg":"Failed to retrieve token balance","instanceId":"C3XkuHcXsdg8","nodeType":"stream","mode":"full","nodeAddress":"<NODE_ADDRESS>","worker_id":1,"application":"xchain","nodeAddress":"<NODE_ADDRESS>","caller_address":"<CALLER_ADDRESS>","function":"process","req.txid":"<TX_ID>","function":"evaluateErc20Operation","error":"Post \"https://example.domain.tld/0192246b91337ae786d9121ffaf7217f232gfkrl6\": context canceled"}
{"time":"2024-09-21T08:52:48.786316101Z","level":"ERROR","msg":"Failed to retrieve token balance","instanceId":"C3XkuHcXsdg8","nodeType":"stream","mode":"full","nodeAddress":"<NODE_ADDRESS>","worker_id":1,"application":"xchain","nodeAddress":"<NODE_ADDRESS>","caller_address":"<CALLER_ADDRESS>","function":"process","req.txid":"<TX_ID>","function":"evaluateErc20Operation","error":"Post \"https://example.domain.tld/0192246b91337ae786d9121ffaf7217f232gfkrl6\": context canceled"}

Additional context This issue was previously raised and resolved, but after a recent change to wallet balance retrieval, it seems to have reappeared. I'm currently implementing a Python-based solution that obfuscates logs before they are uploaded to S3, but it would be ideal to handle this at the source.

Here's a snippet of the Python logic I'm using to obfuscate the logs:

    def _clean_logs(self, log_data: str) -> str:
        """
        Cleans the logs by obfuscating API keys.

        Args:
            log_data (str): The raw log data.

        Returns:
            str: The cleaned log data with API keys obfuscated.
        """
        pattern = r'(https://[^.]+\.domain\.tld)/([A-Za-z0-9]+)'

        def obfuscate_api_key(match):
            domain = match.group(1)
            api_key = match.group(2)
            obfuscated_key = api_key[:4] + '*' * (len(api_key) - 8) + api_key[-4:]
            return f"{domain}/{obfuscated_key}"

        cleaned_logs = re.sub(pattern, obfuscate_api_key, log_data)
        self.logger.info("Logs cleaned and API keys obfuscated.")
        return cleaned_logs
sergekh2 commented 3 weeks ago

We are going to look at the ways to sanitize logs for secrets.