Closed atingchen closed 1 year ago
Are you applying symmetric encryption? Or do you want to use asymmetric key/pair encryption? Why is encrypt-round used explicitly? Do you know what algorithms/key types you'd use here? Is there a way to deobfuscate the attributes as part of a similar processor later on? I bet this can be applied to metric attributes too, not just traces and logs. Does this apply to log bodies too?
Are you applying symmetric encryption? Or do you want to use asymmetric key/pair encryption? Why is encrypt-round used explicitly? Do you know what algorithms/key types you'd use here? Is there a way to deobfuscate the attributes as part of a similar processor later on?
First of all, our goal is to provide a processor that can obfuscate data and restore data in some scenarios, rather than a secure encryption scheme.
Second, the processor uses Feistel cipher
to implement format-preserving encryption.
The concept of the Feistel cipher
described in Wikipedia as:
A Feistel network uses a round function, a function which takes two inputs – a data block and a subkey – and returns one output of the same size as the data block. In each round, the round function is run on half of the data to be encrypted, and its output is XORed with the other half of the data. This is repeated a fixed number of times, and the final output is the encrypted data.
An important advantage of Feistel networks is that the entire operation is guaranteed to be invertible (that is, encrypted data can be decrypted), even if the round function is not itself invertible. So there is a way to deobfuscate the attributes.
I bet this can be applied to metric attributes too, not just traces and logs. Does this apply to log bodies too?
From my point of view, this Processor is not very useful for metrics. Because the original intention of this Processor is to obscure sensitive information that may be related to users. I don't think any attributes related to users should appear in the attributes of Metrics, otherwise it will cause high cardinality problems. If it's just some general metrics attributes, fuzzing doesn't seem very useful. Of course, this Processor can be applied to Metrics.
I don't currently process the log bodies. In order to implement this solution, we need to use regular expressions. This is a very expensive operation.
The way to save the encryption key and the number of rounds we can continue to discuss.
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers
. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
Hi @atingchen I noticed you marked this issue as completed. Was looking to help out by working on this processor but seems you might already have an implementation? Curious to know the state of things. Otherwise if you're no longer working on this issue I'm happy to take a look. Thanks!
@moh-osman3 Hi Moh, you can give it a try. My issue has not been accepted, so the code has not been merged into the main branch. After researching, I have not found a particularly useful package for format-preserving encryption.
The purpose and use-cases of the new component
The new processor will apply format-preserving encryption to obfuscate the data. Our goal is to provide a processor that can obfuscate data and restore data in some scenarios, rather than a secure encryption scheme. It is important that any code used for obfuscating telemetry data for research be widely reviewed by the community--these tools need to be well reviewed and held in community.
Related Cases
The user has exported traces and log data and hopes to analyze them with diagnostic tools. There may be some attributes that contain user data, so the current Processor can be used to blur the trace and log without destroying the format of the data.
Related to https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/13626
Example configuration for the component
Telemetry data types supported
Traces and logs.
Is this a vendor-specific component?
Sponsor (optional)
No response
Additional context
I have written an early implementation that can obfuscate traces data in format-preserving encryption.