rchillyard / The-repository-formerly-known-as

2 stars 12 forks source link

Determine if strings can be encoded perfectly #30

Closed rchillyard closed 4 years ago

rchillyard commented 4 years ago

There is currently no check on whether our array of Strings can be perfectly encoded (thus we are able to skip the system sort).

However, if we redefine the method boolean perfect(int) in the HuskySequenceCoder interface to int perfectLength, we could test each string as we go until we know for sure that the coding is imperfect.

rchillyard commented 4 years ago

My comment about redesigning the HuskyCoder interface is rubbish. We already have the methods we need. The perfect()method is only suitable for classes where every instance is either perfect or not. Such is not true for sequences because sequences are perfect or not according to their length. So, for sequences, we must call perfectForLength(s.length()) during the encoding loop. A boolean "perfect" should start as true. As soon as we find a string that cannot be coded perfectly, we set perfect to be false and don't test any more. In other words we use something like: if (perfect) perfect = perfectForLength(s.length());

There is an alternative which might actually run faster. That is in each step of the coding loop, we keep track of the maximum length of all strings, by updating maxLength if the current string is longer. At the end we just write: if (!perfectForLength(maxLength)) Arrays.sort(xs);

darshan-dedhia commented 4 years ago

Currently, HuskyEncode returns the long array containing encoding values, but now on we have to check for perfectForLength and hence will have to either return "Max length of the string" or boolean that determines if values are perfectly encoded. In either case, we will have to return multiple values from the huskyEncode method wherein one would be long[] for encoded values and other would-be boolean to determine if values are perfectly encoded. To achieve the above functionality we have several options:

  1. Use Pair class (from apache-commons) which permits returning of two values.
  2. Or write another result class and pass both values through it.

I believe the second option would be preferable as in the future if any extra values need to be passed then only one minor change would be required.

Does this approach seem to be right?

rchillyard commented 4 years ago

This is a possible solution. But I think it's a little bit of overkill. Good to get @Sraw 's opinion too. It's not as if the answer to whether the encoding is perfect or not is any mystery. I think my suggestion maybe best: that we just keep track of the longest string and test that one to see if the encoding is perfect or not. That involves N primitive operations and N calls to string.length. My other suggestion is that we check perfection in parallel with every call to the encoder--until we know that encoding is not perfect. Then we skip making the extra call. This results in M method calls and N primitive ops. Assuming M < N, that should be the best approach. Comments?

Sraw commented 4 years ago

I agree with the professor, it doesn't need to be that complicated. I have some ideas about the implementation: How about we use the last left bit in the long. We can expand the Math.min method, itself involves the if statement, we can add a bit operation like result = result | TOO_LONG_MASK. Then, we extract this mask by flag = result & TOO_LONG_MASK, and restore the real value. Or we can refactor the current coder, integrate the stringToLong function inside the coder so that we can directly modify the instance variable.

rchillyard commented 4 years ago

I had wondered about that idea too. I think, as always, a little experimentation will determine what works and whether or not it adds an overhead. Any chance we can get this for sprint 1, which is due in two days time?

rchillyard commented 4 years ago

This has been merged into master.

rchillyard commented 4 years ago

Closed.