projectlombok / lombok

Very spicy additions to the Java programming language.
https://projectlombok.org/
Other
12.86k stars 2.38k forks source link

[FEATURE] Add @ToString.Mask to mask field values with sensitive data in toString method #2197

Closed dome313 closed 4 years ago

dome313 commented 5 years ago

When logging certain classes, it would be good not to exclude fields with sensitive data with @ToString.Exclude, but rather just mask their values with arbitrary value, e,g, ****. This would make clear in logs that those fields exist, but that their values are sensitive or personal data. Lomboked version

@ToString
public class Ticket {
  private String row;
  private String seat;
  @ToString.Mask
  private String owner;
}

Generate java code:

@Override
public String toString() {
  return "Ticket(row=" + this.row + ", seat=" + this.seat + ", owner=*******" + ")";
}

The annotation would be nested in ToString annotation and could look like

@Target({ElementType.FIELD, ElementType.METHOD})
@Retention(RetentionPolicy.SOURCE)
public @interface Mask {
  String value() default "********";
}

Target audience would be all developers that need to have logging that complies with GDPR standards.

rspilker commented 5 years ago

Any reason why the contents of the mask needs to be configurable?

dome313 commented 5 years ago

different people have different preferences when masking inputs, some wants to put ******, some use <HIDDEN> and so on... that's the main reason.

below part is optional Also with configurable content, this can be extended to support advanced syntax. For example, lets say that # represent character from the original string, then annotating with

  @ToString.Mask("*******###")
  private String mobileNumber;

would output *******567 for mobile number 0991234567

Maaartinus commented 5 years ago

You'll get most flexibility by using

@ToString.Include(name="owner")
private String ownerForGdpr() {
     return whatever you want
}

I'm not saying that a shortcut like @ToString.Mask wouldn't be useful.

dome313 commented 5 years ago

thanks @Maaartinus, nice one! two downsides:

Caleb-C commented 5 years ago

This feature (@ToString.Mask) would be incredibly useful in a number of industries. I would be happy to implement it when I have the time, but would be pleased if someone beat me to it!

rspilker commented 5 years ago

Why would you include such a field in the tostring at all? Using a mask wouldn't add any value to the tostring.

'@ToString.Exclude' removes the entire field from the tostring.

If that's not good enough, and you definitely want to have the field in there with some special rendering, the annotated method is warranted.

Caleb-C commented 5 years ago

There are several cases where it would be useful to mask content: PII, PHI, credit card numbers, passwords, and many more.

It would be much more useful to see that the field is populated (and perhaps the length of string fields) than just leaving the fields out of the log message.

While it is possible to mask the value with a method per field, it would be far easier to use a feature like this.

xenoterracide commented 4 years ago

passwords specifically should leave the length out, as that information can be used for an attack

rzwitserloot commented 4 years ago

Yeah, including length info is right out. I can see just 'rendering' null or, by exception, the blank string if it is a string (and that's a tricky exception to make already, because there are other notions that can be 'empty', such as lists, charsequences, you may have a class CreditCard with a sentinel value called BLANK which you would want to know about...) – and other than that, . Not a bunch of stars, as that wrongly suggests the string is at least that long.

I would accept a PR with it, but, it should have no arguments (not worth the effort or learning curve to let you specify the mask, if we really want this, lombok.config would be a better place for it), the behaviour is the same (field name included, value is ), and let's leave whether to explicitly show nulls and, if the relevant item has a type of String, blanks - okay. (and a string with only whitespace is not 'blank'). Should also work with methods included via @ToString.Include.

eugeneseppel commented 4 years ago

I also vote for this feature to avoid printing high critical PII to debug log. Currently use workaround (thanks to @Maaartinus ) but such approach looks ugly and not error-prone.

rzwitserloot commented 4 years ago

I've reviewed this one, really tried to get to the heart of the cost/benefit here.

Bad news; I am rescinding my earlier promise to accept a PR, and closing the issue. – The feature is not a good idea. I will lay out the reasons for it; this way, if you feel this decision is in error, please delve into the arguments laid out here; show how they are either incorrect or insufficient.

NB: Lombok has millions of users; even hundreds of 'but I like it!' don't move the needle and never will. Lombok is not a popularity contest; feature requests need to include falsifiable reasons, and 'ability to get a few friends to upvote an issue' should not move the needle. For mostly the same reason, "I find this ugly" or "This idea is elegant" are useless arguments. They are not falsifiable; such statements cannot usefully be part of a debate. So please don't make them. They just crowd out the useful debate.

So, what is the benefit in this cost/benefit analysis?

You'd think that the argument here is 'this makes it possible to adhere to regulations and hide sensitive information from logs', but that is NOT correct. Because lombok can already do that for you; you can either go with:

Alternative A:

@ToString.Exclude private String sensitiveField;

Alternative B:

private String sensitiveField;

@ToString.Include(name = "sensitiveField")
private String sensitiveFieldMasker() { return "****"; }

The exclude variant isn't even boilerplate; if that's what you want, it is 100% as good as the mask proposal (in that you need to put an explicit annotation on the field). The masker method solution is some boilerplate, but as I'll show later (in the cost section), also has crucial advantages.

So, the benefit of this feature is only in eliminated the above much more limited boilerplate.

Is that still worthwhile?

I don't think so:

The contrast is forgetting the entire lombok @ToString feature. As a hypothetical, let's say you have 20 fields in the class, and 2 need masking. Without lombok, you let your IDE generate the thing, and manually go in and replace the lines for the 2 masking fields with stars or whatnot.

This has 3 significant costs:

And the @ToString.Include alternative solves all 3 of these major issues: You need not mess with anything tostring related when adding or removing or changing non-mask fields, it is easy to see that masking of these sensitive fields is being done, and it's unlikely you'll remove one of these masker methods by accident (the most likely route to an accident is that you rename that field, resulting in the masking method no longer hiding it, as it no longer shares the name now. If this worries you, you can always shove an @Exclude on the field and close this loophole!) Thus, there are only two ways to take this feature request:

Thus, the 'benefit' side of the cost/benefit analysis here is really, really low. so low, the feature is probably DOA already. But, it gets worse. A lot worse.

Let's talk about that masking concept: What do people actually WANT here? It is clear to me from this thread that people mean different things when they use the word 'mask':

And the list goes on.

The ability to let the programmer decide, in terms of java code and not some ersatz really crappy DSL (annotation params are a really bad DSL!), is a significant advantage of the @ToString.Include solution. You can do everything in that above list, fairly simply.

This leaves us with only 2 options, and they both lead to the same conclusion:

And there you have it. No matter which way we do it, the end result would be bad.

I do hope nobody was already on a PR for this. I apologize for jumping the gun on promising we'd accept one.

juriad commented 4 years ago

Not working on a PR, don't worry ;-) I personally do not need masking at all and am happy with the current options. I mostly agree with you that @ToString.Exclude and @ToString.Include with a custom method solve most cases.

I want to point out one issue which has not been discussed yet and might bring us to a solution which people desire. Writing custom masking methods and annotating them with @ToString.Include might be tedious; also many of these methods will be exactly the same.

Let me propose one solution and see where it can lead us. As far as I understand you reasoning, my proposal does not contradict it.

Static methods

What if we refactored masking methods into static methods in some other class? Lombok could then easily wrap the value with a static method call. This would work nicely with @UtilityClass.

The following code:

private final String password;

@ToString.Include(name="password")
private void maskedPassword() {
  return password == null ? null : "*****";
}

Could be instead written as:

package my.company.is.the.greatest;
@UtilityClass class Maskers {
  private String maskedPassword(String password) {
    return password == null ? "none" : "*****";
  }
}

And then later used as:

@ToString.Mask(method="my.company.is.the.greatest.Maskers.maskedPassword")
private final String password;

Configuration

Since a user is required to fill the fully qualified name of a method every time, there is little advantage to it; we cannot easily use static imports (which would make it shorter) because IDEs might be removing them when organizing imports.

Let's think about shortening the fully qualified name. We can introduce lombok.config configuration such as:

The last one is clearly superior.

The annotation would have two parameters which are mutually exclusive:

Null handling

Some maskers might want to handle null themselves; most often the default implementation is good enough (prints null). The annotation could be parametrized by @ToString.Mask(..., handleNull=true) (false by default). The default could be configurable. On the other hand, this bit brings just too much complexity and it might be easier to have the user always handle nulls.

Related features

This feature would allow to transform value for use in the toString generator. Could Lombok use this elsewhere?

We already have @Builder.ObtainVia which can call a static method on Self and pass it this. Would it be useful to extend it to be able to call a static method on a different class?

Naming

This feature could turn into more than masking; it could be:

We should also consider naming consistency with the related features.

I don't think we should call it @ToString.Mask. I leave the question of choosing name open. We might want to start the discussion over in a new issue.

ArnauAregall commented 4 years ago

Don't want to "wake the dead", but... how about this?:

@ToString.Pattern(pattern = java.lang.String, match = boolean)
bogdartysh commented 2 years ago

I mostly agree with you that @ToString.Exclude and @ToString.Include with a custom method solve most cases.

even so in 80% of cases it's ok, 20% of a project with gigabytes of code is a lot of work to do...

toString(){
 return field == null? null: "xxxxxx"
}

so as if field is nul I think it's interesting....

and in 5% of cases the logic would be more complicated:

toString(){
 return field == null? null: toString()[n_first_characters] + "xxxxxx" + toString()[n_last_characters]
}
wolfiesonfire commented 2 years ago

@dome313 @Maaartinus @Caleb-C @eugeneseppel @juriad

I recently had this kind of request and implement with few code.

Use like this.

First define a tool.

package com.example;

public class DesensitizeUtil {

    /**
     * replace as * from 5 ~ 8
     */
    public static String desensitize(String param) {
        return param.substring(0, 4) + "****" + param.substring(8);
    }

}

Use ToString.Handler on your entity, with specify the method above.

package com.example;

import lombok.Data;
import lombok.ToString;

@Data
public class User {

    // class 2 method name
    @ToString.Handler("com.example.DesensitizeUtil.desensitize")
    private String password;

    // class and method name
    @ToString.Handler(handlerClass = DesensitizeUtil.class, handlerMethod = "desensitize")
    private String bankcard;

}

https://github.com/wolfiesonfire/lombok-desensitize

rzwitserloot commented 2 years ago

@ArnauAregall wrote:

Don't want to "wake the dead", but... how about this?: @ToString.Pattern(pattern = java.lang.String, match = boolean)

This isn't legal java and makes no sense to me. match = boolean? What does that mean? I'm not sure I understand what you're proposing.

rzwitserloot commented 2 years ago

@ToString.Handler("com.example.DesensitizeUtil.desensitize")

Shoving FQN classnames in a string literal is vetoed. We don't like 'stringly typed' anything in java; we introduced it ourselfs in e.g. @EqualsAndHashCode(of = "fieldname") and have addressed our own errors there by introducing @EqualsAndHashCode. There could be @ToString.Handler(PasswordDesensitizer.class), which also fixed your dubious naming practice there (DensitizeUtil? Oof. Just Desensitizer seems right). We could default to having the 'handler' method be named apply or some such, letting you specify a different name if you must. Or even that maskers MUST implement UnaryOperator<T>, thus, we know the method name: apply. Unfortunately, we'd have to construct a class every time which is a bit of a waste, so perhaps we do need a name... and then we're back in stringly typed territory, ugh.

All that pain and crappy implementation, and for what? To address a dubious practice (I think you're in a bit more trouble than just needing to fix some toStrings if sensitive info makes it quite this far), and to avoid the boilerplate of:

@ToString.Include
private String bankcard() {
  return bankcard == null ? "null" : "****";
}

That's not good enough. Issues stays closed until someone comes up with a better plan than this.

sumitnsit commented 1 year ago

This annotation will be really helpful to avoid printing PII information in logs. There is no point adding @ToString.Exclude and then @ToString.Include in the code for each field.

@ToString.Mask(regEx, replaceString)

e.g.

@ToString.Mask("(?=.{4})", "*") String accountId;

So account id 1234567890 will be printed as **7890 in toString method.