projectlombok / lombok

Very spicy additions to the Java programming language.
https://projectlombok.org/
Other
12.87k stars 2.38k forks source link

Make @Singular annotation a bit more efficient #3187

Closed andriewski closed 2 years ago

andriewski commented 2 years ago

I'd suggest to modify current @Singular annotation behaviour or just add special flag to maintan that behaviour.

  1. There is no need in invoke .clear() method for collection - we can just set null for it. Current java implementation for collection - iterating over it and set null for each element.
  2. If passing collection is null - please just replace passing collection - it will be ummutable anyway after .build() method is invoked

The problem in current implementation is if you find Object with field marked as @Singular and you just want to replace items you will need to user .clearItems() and then .items(), but generated code is not really efficient. Basing clear() method for list iterates over whole collection and set null to each element. Then .items() method iterates over one collection to add and add them to inner collection, so it's O(N) + O(N) complexity for both cases.

In fact for for .items() it uses System.arrayCopy() so it works almost perfectly fine, but still just replacing it should be better, since you just need to change pointer for collection.

So we just don't use that annotation for our projects, even though that one has really cool features.

There is just base code snippet, what I ment:

@Builder(toBuilder = true)
@Value
public class User {

    int id;

    @Singular
    List<String> roles;

    public static void main(String[] args) {
        // We're creating user and saving it
        User user = User.builder()
                .id(1)
                .role("role1")
                .role("role2")
                .build();

        save(user);

        // Some moment ago we're finding that user
        User foundedUser = findById(1);

        // Changing roles for user using some criteria
        List<String> changedRolesForUser = changeRoles(foundedUser);

        User userToSave = foundedUser.toBuilder()
                .clearRoles()
                .roles(changedRolesForUser)
                .build();

        // saving that user
        save(userToSave);
    }
// Generated Builder
`public static class UserBuilder {

        private int id;
        private ArrayList<String> roles;

        UserBuilder() {
        }

        public UserBuilder id(int id) {
            this.id = id;
            return this;
        }

//  That one is fine
        public UserBuilder role(String role) {
            if (this.roles == null)
                this.roles = new ArrayList<String>();
            this.roles.add(role);
            return this;
        }

//  Main idea is cool and it work perfectly fine, when init collection is not empty, otherwise -> System.arrayCopy
// I don't understand why it's just not repalce it if this.collection is empty
// Anyway it will be ummutable after .build() method is invoked
        public UserBuilder roles(Collection<? extends String> roles) {
            if (this.roles == null)
                this.roles = new ArrayList<String>();
            this.roles.addAll(roles);
            return this;
        }

       // It's better to set just **null** instead of clearing it. Clearing is a bit expencive opeartion. It takes O(N)
        public UserBuilder clearRoles() {
            if (this.roles != null)
                this.roles.clear();
            return this;
        }

        public User build() {
            List<String> roles;
            switch (this.roles == null
                    ? 0
                    : this.roles.size()) {
                case 0:
                    roles = java.util.Collections.emptyList();
                    break;
                case 1:
                    roles = java.util.Collections.singletonList(this.roles.get(0));
                    break;
                default:
                  // If final collection is unmodifiableList why don't we just replace items instead of invoking **.addAll()** method?
                    roles = java.util.Collections.unmodifiableList(new ArrayList<String>(this.roles));
            }

            return new User(id, roles);
        }

        public String toString() {
            return "User.UserBuilder(id=" + this.id + ", roles=" + this.roles + ")";
        }
    }`
andriewski commented 2 years ago

Desired implementation of @Singular @Builder with to builder would be

@Value
public class User {

    int id;

    List<String> roles;

    public static UserBuilder builder() {
        return new UserBuilder();
    }

    public UserBuilder toBuilder() {
        return new UserBuilder().id(this.id)
                .roles(this.roles == null
                        ? java.util.Collections.emptyList()
                        : new ArrayList<>(this.roles));
    }

    public static class UserBuilder {

        private int id;
        private List<String> roles;

        UserBuilder() {
        }

        public UserBuilder id(int id) {
            this.id = id;
            return this;
        }

        public UserBuilder role(String role) {
            if (this.roles == null)
                this.roles = new ArrayList<>();
            this.roles.add(role);
            return this;
        }

        public UserBuilder roles(List<String> roles) {
            if (this.roles == null) {
                this.roles = roles;
            } else {
                this.roles.addAll(roles);
            }
            return this;
        }

        public UserBuilder clearRoles() {
            if (this.roles != null) {
                this.roles = null;
            }
            return this;
        }

        public User build() {
            List<String> roles;
            switch (this.roles == null
                    ? 0
                    : this.roles.size()) {
                case 0:
                    roles = java.util.Collections.emptyList();
                    break;
                case 1:
                    roles = java.util.Collections.singletonList(this.roles.get(0));
                    break;
                default:
                    roles = java.util.Collections.unmodifiableList(new ArrayList<>(this.roles));
            }

            return new User(id, roles);
        }

        public String toString() {
            return "User.UserBuilder(id=" + this.id + ", roles=" + this.roles + ")";
        }
    }
}

So, that would pass

    public static void main(String[] args) {
        // We're creating user and saving it
        User user = User.builder()
                .id(1)
                .role("role1")
                .role("role2")
                .build();

        User updatedUser = user.toBuilder()
                .clearRoles()
                .roles(List.of("role3"))
                .build();

        User userWithEmptyCollection = updatedUser.toBuilder()
                .clearRoles()
                .build();

        User userWithNewSingleRole = user.toBuilder()
                .clearRoles()
                .role("role1")
                .build();

        User userWithAddedRole = user.toBuilder()
                .role("role3")
                .build();

        User userWithAddedRoles = user.toBuilder()
                .roles(List.of("role3", "role4"))
                .build();

        isTrue(updatedUser.getRoles().equals(List.of("role3")));

        isTrue(userWithEmptyCollection.getRoles().isEmpty());

        isTrue(userWithNewSingleRole.getRoles().equals(List.of("role1")));

        isTrue(userWithAddedRole.getRoles().equals(List.of("role1", "role2", "role3")));

        isTrue(userWithAddedRoles.getRoles().equals(List.of("role1", "role2", "role3", "role4")));
    }
rzwitserloot commented 2 years ago

I don't think either of your suggestions is desirable or smart. Hence, denied - but read on - I may have misjudged your request or the performance impact. JMH results and profiler reports may change my mind (in which case, comment those and the issue will be reopened if warranted).

If I understand you correctly, you want 2 changes. One of them is straight up incorrect, one of them is a purported performance improvement, but I think it'll actually be a performance decrement. Because computers are more complex than claiming 'but it is O(n)!'.

Just use the argument directly instead of calling .addAll

Main idea is cool and it work perfectly fine, when init collection is not empty, otherwise -> System.arrayCopy // I don't understand why it's just not repalce it if this.collection is empty

If I understand you correctly, you think that if the current 'status' of the singular list in the builder is that it is empty (i.e. the private ArrayList<String> roles; field in the builder instance is null or an empty list), that we should not create a list and .addAll(argument), instead, we should just do roles = argument.

Anyway it will be ummutable after .build() method is invoked

This is incorrect. It will be unmodifiable. That is not the same as immutable. The list ref we end up putting in the built object cannot be used to modify the list, but the list is not (neccessarily) immutable: It might be changed by other code. Here is an example of that:

List<String> list = new ArrayList<String>();
list.add("Role1");
someBuilder.roles(list);
User user = someBuilder.build();
System.out.println(user.getRoles().size()); // prints '1'
list.add("Role2");
System.out.println(user.getRoles().size()); // prints '2'

A different example of why what you suggest wouldn't work:

(NOTE FOR CONTEXT: List.of(a, b) makes immutable lists!)

List<String> list = List.of("Role1", "Role2");
someBuilder.roles(list);
someBuilder.build();

Currently we 'inefficiently' make a copy of this already immutable list, and then make the copy unmodifiable to boot, during the build() call. So now we have 2 identical lists on the heap, both immutable (an unmodifiable ref to a list for which no other refs exists is immutable), and that is indeed, as you noticed, inefficient. However, imagine you wrote this code instead:

List<String> list = List.of("Role1", "Role2");
someBuilder.roles(list);
someBuilder.role("Role3");
someBuilder.build();

Had we gone with your suggestion, we'd now get an exception on the .role("Role3") line - after all, the builder's roles field is pointing at that List.of() result, which is immutable. We could write code that tries to add, catches the exception and creates a complete arraylist instead in the name of efficiency, but now we get into the tradeoff game, see below.

Do not call .clear()

It's better to set just null instead of clearing it. Clearing is a bit expencive opeartion. It takes O(N)

This is effectively incorrect. clear() is not expensive, and is effectively O(1). clearing is often in fact less expensive - we already have a list object, we might as well keep it around. The reason it's effectively O(1) has to do with CPU caches. We already have an array and it is likely already in-cache. A tight loop that stays in-page is effectively O(1); it is exceedingly unlikely, as in, 1 in many millions, that a user of a builder would first add thousands of entries via the singular builder, then clear() away all that work, and then start over. What's more likely is that something pre-initializes the builder with a handful of initial items, and some sidetrack code wants to undo that. In which case we're clearing less than a page's worth of references which is instant, or at least, likely faster if you measure the load on the entire system (i.e. also code running in other places), as this is less likely to cause page swaps.

This is not how you make performance optimization feature requests

You're making performance suggestions whilst you're just spouting received knowledge about what you think is performant and what isn't. Performance is incredibly complicated. If you think you know about performance (and the fact that you had the audacity to write this feature request certainly suggest you feel pretty confident), you.. don't, and you're a danger to your codebases if you do. Don't feel bad - just about every programmer I know, including myself, has gone through this phase. In the distant past, it wasn't even a mistake - just eyeballing code and making judgements about performance was pretty reliable with sufficient experience and knowledge.

But this is simply no longer the case. Between hotspot, configurable garbage collection schemes, and CPUs that do a ton of optimization work on the fly (pipelines and the like), nobody can eyeball code and make performance guesstimates anymore. The JVM engineers, who are probably the most likely of all programmers on the entire planet to make performance calls just with eyeballs, have said the same thing at conferences and the like: You CANNOT eyeball code and make such casual performance observations. You will have to write extensive tests, using e.g. Java Microbenchmark Harness, and a lot of profiler reports.

Performance 'suggestions' are categorically denied without such things (JMH results and profiler reports).

Notes about tradeoffs

You can come up with a ton of really cheap little bookkeepy actions and checks that would markedly improve performance in exotic situations. We could use Collections.singletonList first, and then expand later, for example. Or we could check if the incoming list is itself immutable (unfortunately, java has made this complicated, but even so, there are ways), use it, and then when adding, check again and make a new arraylist after all in such cases.

Forget the burden of maintaining a considerably more complex code generation path for a moment: That's death by a thousand cuts. Those nearly free cheap little bookkeepy actions are applied to every builder usage in the entire ecosystem, whereas the performance benefit of us not making a pointless second version of an already immutable list is only a gain in a vanishingly small amount of cases. In those cases, it would be better, no doubt, but we're now putting on the scales 2 different options:

  1. Everybody in the village (of 500 inhabitants) pays a single cent.
  2. The baker and the butcher each pay a euro.

Which one is better? Well, actually, having the baker and the butcher pay the euro. Or is it? Perhaps the intent was that nobody'd feel it, so smearing the cost out across the village, even though that costs more, is more desirable. Or not? Who knows - the builder codegen doesn't know precisely what your use case ends up being, and doesn't want to provide you with a hopelessly huge wall of finicky little config options to let you tell it - because 99% (literally!) of the time none of it matters, this isn't the performance-crucial codepath, or it is but you're only adding a handful of things so it's all effectively instant due to being all-in-one-cache-page. Hence, if you have specific performance needs, you're in the 0.1% case, and therefore it is no longer boilerplate, and therefore lombok's answer is an easy: Just write it out longhand, if you have such exacting needs.

andriewski commented 2 years ago

@rzwitserloot thank you for your reply

First of all, I would like to appoligize for not professional approach, of course I should've sent you benchmark results about any piece of code (crap).

  1. You are right, that suggestion that I provided doesn't fit for each and everyone, I might work for our project, but I would cause breaking changes for other lombok users, so It couldn't be applied anyway
  2. About .clear() I just made single change for method and made some benchmarks and got the following results with initial collection with XXX numbers of phones - 0, 3, 10, 30, 100 with this piece of code

Setup:

Warmup: 10 iterations, 10 s each Measurement: 10 iterations, 10 s each Timeout: 10 min per iteration It took ~3h of testing

userWithXXXPhones
                .clearPhones()
                .phone("phone1")
                .phone("phone2")
                .build();

Custom means - just simple set null instead of .clear() invocation Original means - original implementaion

customClearWith0Objects                                     thrpt   50  17087111.015 ± 222870.756   ops/s
customClearWith100Objects                                   thrpt   50  17066783.642 ± 253036.898   ops/s
customClearWith10Objects                                    thrpt   50  17009708.513 ± 280707.969   ops/s
customClearWith30Objects                                    thrpt   50  16957658.087 ± 210687.424   ops/s
customClearWith3Objects                                     thrpt   50  16553479.512 ± 399414.193   ops/s

originalClearWith0Objects                                 thrpt   50  19372425.856 ±  63110.387   ops/s
originalClearWith100Objects                               thrpt   50  12275440.653 ± 313264.782   ops/s
originalClearWith10Objects                                thrpt   50  19336396.755 ± 215891.888   ops/s
originalClearWith30Objects                                thrpt   50  17188988.843 ± 180287.831   ops/s
originalClearWith3Objects                                 thrpt   50  19860686.659 ± 474809.643   ops/s 

I tested with GC profile, but I'm not sure you are interested in it for now.

This result show, that for really common needs - .clear() is surelly enough and array allocation is quite costy operation, so .clear() for perfectly fine up to 30 objects and considering it less affect for GC cycle for allocation extra array of objects - that's really great

But, for more than 30 objects, I got only test for 100 objects - It shows linear complexity - kinda O(N), so we can make a conclusion, that it really depends on list size to make it clearance.

Of course as you said, If we have some specific situation - we should implement it ourselves.

Honestly, the reason why I started spilling with my own naive solutions (crap) was that behaviour that @Singular annotation makes is quite tricky

If we have just this peace of code

User user = findById(1);

List<String> updatedPhones = findPhones();

user.toBuilder()
               .phones(phones)
               .build();

It behaves differently if we have @Singular annotation or we don't have it

  1. If we this annotation, it adds extra phones
  2. If we don't have this annotation it just sets these phones for the user

But we can not be sure what will happen, unless we check our User class implementation, so in my humble opition, that's a bit tricky, but as I said @Singular annotation provides really cool features for any collection

That would be really great if you make maybe in the future, something that would work less triky

Something like

@Bulder.Default
List<String> list = Collections.emptyList();

But It would work like if list is null - that would init it with emptyList(), since for now

// list would be null
User.builder()
.list(null)
.build()

// list would be empty list
User.builder()
.build()

It could be something like

@Unmodifiable
List<String> users

And it would do that cool .build() part where we add emptyList or make it with Collections.unmodifiableList()

Of course you are right with that statement, that it's not immutable, since if we have link to it - we can modify it, so I understand why you made it with .addAll() method, but from my perspective as a Java developer - I guess that would be really cool if you added something that it works like simple collection setter for builder and make it unmodifiable afterwards, since nobody would modify the collection's link, since it just doesn't make any sence

Anyway, I would like to say thank you for your contribution in Java community, since each and everyone uses this library for their projects and to put it mildly Java would be less user friendly without it

rzwitserloot commented 2 years ago

First of all, I would like to appoligize for not professional approach, of course I should've sent you benchmark results about any piece of code (crap).

Hey, we haven't posted that rule anywhere, so there is no need whatsoever to apologise.

Thanks for that performance report! It's good to know that the breakeven point is somewhere in the 20-100 items mark. I think we can stick with .clear() given that I bet 'sub 20 items' is far more common, but, it is a bit painful that clearing thousands of items is pricey.

However, JMH reports do not report on future impact. Leaving a huge arraylist around for the garbage collector to chew through is not free, but a JMH report is never going to tell you about that cost. It'll be amortized over a very long runtime, across all code that JVM is running. But it's still a cost, which gives .clear() some extra bonus points, carrying it across the line here I think.

It behaves differently if we have https://github.com/Singular annotation or we don't have it

Yes, I see that. This is unfortunate. I'm not quite sure how we can address this; I guess in retrospect possibly .phones should let you set the list verbatim, and perhaps .phone (the singular) isn't just varargs, it also gets overloaded with a Collection<? extends T> taking list variant that does what .phones does now.

But, changing all that now would practically require releasing lombok v2 – that breaks every usage of @Singular out there.

But It would work like if list is null - that would init it with emptyList(), since for now

I'm not sure I follow here. Are you saying: You'd like some feature where if some code explicitly calls .list(null), that lombok will replace that during build() with an empty list? Denied, categorically. That 'view' of what null means (a placeholder for default value / a sentinel for an empty value) is somewhat common in the java ecosystem but lombok espouses the more logical take on what null is supposed to mean in java code: A real 'value', and it means 'unset, not applicable, missing, or otherwise unknown'. In other words, the fact that any attempt to interact with what it is referencing should be throwing an NPE.

Also, @Unmodifiable as a term means more or less the same thing as final, so it's a bit odd that annotating with @Unmodifiable means: "The thing that this variable points at is intended to be unmodifiable; not this variable itself". Also, "that means null is replaced with an immutable empty list ref" is a total non-sequitur. What does that have to do with unmodifiability? a null pointer is quite unmodifiable, after all.

It would, however, be kinda neat if you can 'unset' in a builder - i.e. explicitly tell the system to go back to whatever is default. However, in general, the job of re/un-setting anything in a builder is a sub-1% use case as is. Few usages of builder ever use any overwrite/unsetting behaviour. It's most common, granted, in .toBuilder() scenarios.

In general toBuilder() is where the real mess occurs: There you may want to say: "Go back to default value" (as per @Builder.Default), or indeed where the fact that .phones() on a @Singular list adds more phones, whereas without @Singular it would replace. On a fresh builder that is mostly irrelevant, on a .toBuilder() approach that's a different story altogether.

I'm not quite sure what an update would look like, though - any annotations are likely just confusing (e.g. @Unmodifiable, I certainly have no idea what that's supposed to do here), and in any case, just about everything that even gets anywhere near simple and understandable would at this point no longer be backwards compatible. What's left is a forest of settings and exotic annotations, and that's not worth the maintenance burden, I'm afraid.

I'll leave this issue closed, but I'm taking under advisement that builder has some design issues especially when .toBuilder() is involved. I'll keep an eye out - if ever I see a clean solution, I'll be on it :)