williamfiset / Algorithms

A collection of algorithms and data structures
MIT License
17.29k stars 4.37k forks source link

Set size of merged component to zero (UnionFind) #127

Closed alirezaghey closed 4 years ago

alirezaghey commented 4 years ago

Don't we need to set the size of the merged component to 0 when unifying?

https://github.com/williamfiset/Algorithms/blob/b8f9453417de418d4a5d4f3ad80595170d836d4e/src/main/java/com/williamfiset/algorithms/datastructures/unionfind/UnionFind.java#L93-L99

Would become:

 if (sz[root1] < sz[root2]) { 
   sz[root2] += sz[root1]; 
   id[root1] = root2;
   sz[root1] = 0;
 } else { 
   sz[root1] += sz[root2]; 
   id[root2] = root1;
   sz[root2] = 0; 
 } 

One could argue that this is not necessary because the component does not exist anymore and we don't access its data, but keeping our data consistent can avoid complications further down the road. I'm a beginner and my suggestion may not make sense. :-) Thanks!

williamfiset commented 4 years ago

Thanks for the suggestion, it's unclear what the best thing to do is. I would probably just leave it be to avoid doing extra work tbh. The consistency seems nice, but I also can't think up a reason why you would need to know that the size of a component that doesn't exist is 0. In fact, it's probably more useful to know what the size of the component was before being merged?

alirezaghey commented 4 years ago

What bothers me is that you can query for the size of a component that doesn't exist anymore. This is very misleading. Maybe raising an exception in case of checking the size of a non-existent component instead of returning zero or even returning its former size is the more natural way of doing things. Anyway, thank you for responding and keep up your great work.

williamfiset commented 4 years ago

You can only query the size of a component from within the class, since the sz member is private and the componentSize method is the only way of getting at sz. Internally the componentSize calls find(p) to get the root of the component, so there's no way of detecting the user's intent of accessing a non-existent component -- you'd need a new method for that.

If you want to go ahead and add the sz[root] = 0 change I'll merge it, I can see arguments for wanting to have the data consistent. Can you add a test to the UnionFindTest.java which after a few union operations sums the size of all the components and verifies that there are a total of n nodes? That test should currently fail, but once you make the change it should pass.

alirezaghey commented 4 years ago

Great! Will do!

alirezaghey commented 4 years ago

@williamfiset One problem that comes to my mind for writing the test is that the sz variable is not accessible from outside and the componentSize method returns the size of the root of a node.

As I understand, it isn't possible to test the validity of the size of a merged node. This can only be done from inside the class.

williamfiset commented 4 years ago

You can use the size you initially create the UnionFind with? You can also add an accessor method.

williamfiset commented 4 years ago

I think you want something like this [I haven't compiled this so it may not work]:

int n = 12;
UnionFind uf = new UnionFind(n);

uf.unify(...);
uf.unify(...);
uf.unify(...);

int totalNodes = 0;
for (int i = 0; i < n; i++) {
  totalNodes += uf.componentSize(i);
}
assertThat(totalNodes).isEqualTo(n);
alirezaghey commented 4 years ago

Your suggested code works, but won't test whether sz has been correctly set to zero. componentSize uses find, as you mentioned earlier in the thread, so it will look up the correct size of the whole component at the root. Here is its code:

  public int componentSize(int p) {
    return sz[find(p)];
  }

I could add an accessor method, as you mentioned, but this would be in conflict with the already existing componentSize. In addition, it doesn't make much sense to expose something to the outside, which isn't supposed to be used, only for testing purposes.

As much as I like to write the test, it occurs to me that this isn't something that we should write tests for because it doesn't change functionality whatsoever and is rather an implementation detail. I imagine that it is enough if the previous tests run smoothly after the change.

Having said that, I'm happy to take the course you see appropriate.

williamfiset commented 4 years ago

You make good points, let's close out this issue, I don't think it's worth it.