471. Encode String with Shortest Length

ErdemT09 commented 3 years ago

Resolves: #222

Recommended prequels: #385

Algorithm:

Approach 1-Pattern Checking + Dynamic Programming:

In the end, a given string can consist of many abbreviations. In fact, the best abbreviation is equivalent to: best[0][e] + best[e+1][length-1] Where best[a][b] can once again equal to some new encodings: best[a][f] + best[f+1]b]

When encoding a string like "abbbbb" ( to "a5[b]"), we need to check for each index i whether a pattern starts at that index that we can encode. For i=0, we cannot find any patterns starting with 'a'. For i = 1, we eventually find the 5 characters long substring of "bbbbb" that repeats and is to be encoded.

At this point, it should be clear that we also need a second loop for this operation, with some end index j. j should not necessarily be the index of the array, it's in fact more useful that it corresponds to the length of the substring we are looking at. Thus "bbbbb" is found, when i=5, j=4. When we have a string of length 100, it's just useful to keep j this way.

In order to calculate the length of the repetition, we can do this in an efficient way using pattern checking via characters indices. In a string like "ababab", the first occurrence of 'b' would make the possibility of the pattern "aa..." impossible. The repetition head then should be sent back to the start. After that, when we encounter 'a' and 'b' again, we can increment it because the pattern holds this time.

Considering a string like "ababababa ababababa" (space for readability), getting to 'a' at the start of the second part would have it compared with 'b'. This mismatch shouldn't send our repetition count back to the start without any repetitions, as the first 'a's match. It should be compared with what the 'b' at index 5 was compared with. Then we compare it with index 'b' at j=3, then to 'b' at j=1. We still have a mismatch, so we compare it with what the first b was compared with, which is the letter 'a'. Our pattern holds now and we can start incrementing our repetition. So, this new 'a' shouldn't be assumed as the end of the previous pattern, but the restart of the previous.

With this, we create all the encodings for string starting and ending at some indices. After creating these encoded substrings, we apply the dynamic programming algorithm stated above, in a linear tracing: (here substrings as matrix ss) bestLength[end] = min{bestLength[start]+ss[start][end].length}, start ≤ end We can also place an extra array here to keep the indices. With that, we eventually trace back these indices and finally assemble the encoded string.

Approach 3

Will be added

altay9 commented 3 years ago

Note for the KMP solution:

https://youtu.be/V5-7GzOfADQ

ErdemT09 commented 3 years ago

Approach 2-DP with Updates

Similar to the dynamic programming approach above, we can group the best codings of s by their end and their start. So, let the return value best[0][length-1].

For each index end index e, best[e][e] is equal to s[e]. Our initial best guess for best[i][e] is then best[i][e-1] + best[e][e], i<e.

Here, there are possible patterns that can occur and end at e. These patterns start at some index b+1. They thus have the length e-b. In order to check if this pattern occurs before, we need to start by checking substrings that start at index r = b+1-patternLength. First we compare the substrings s[r:r+patternLength] and s[b+1:e+1]. If they are equal, this means that we can make an encoding in the form "2[ pattern ]". But this pattern itself might have been encoded. So, we refer to our dp array for its coding -> "2[ best[b+1][e] ]". If this encoding is shorter than best[r][e], we update it. But this would also imply that we need to change our inital values. So, we update: best[u][e] = best[u][r-1] + encodedStr, u<r.

After this, we can decrement r patternLength indices back. We then check for "3[ best[b+1][e] ]", then "4[ best[b+1][e] ]" etc., Incrementally, we build up the way to best[0][length-1].

ErdemT09 commented 3 years ago

Here is approach 3. It is a dp approach like the other 3, but its simpler and slower. I'm leaving it for the future.

class Solution {
    Map<Integer, Map<Integer, String>> map = new HashMap<Integer, Map<Integer, String>>();
    public String encode(String s) {
        return minimize(s, 0, s.length()-1);
    }

    public String minimize(String s, int i, int j){
        if(map.containsKey(i) && map.get(i).containsKey(j)) return map.get(i).get(j);

        // first see if it can be repeated several times
        int minLen = j - i + 1;
        String res = s.substring(i, j+1);
        for(int l = 1; l <= (j-i+1)/2; l++) {
            if( (j-i+1) % l != 0) continue;

            boolean ok = true;
            for(int k = 0; k < (j - i + 1); k++) {
                if(s.charAt(i+k) != s.charAt(i + k%l)) {
                    ok = false;
                    break;
                }
            }
            if(ok) {
                String assemble = (j-i+1)/l + "[" + minimize(s, i, i+l-1) + "]";
                if(assemble.length() < minLen) {
                    minLen = assemble.length();
                    res = assemble;
                }
            }
        }

        for(int k = i; k < j; k++) {
            String mix = minimize(s, i, k) + minimize(s, k+1, j);
            if(mix.length() < minLen) {
                minLen = mix.length();
                res = mix;
            }
        }

        if(!map.containsKey(i)) map.put(i, new HashMap<Integer, String>());
        map.get(i).put(j, res);
        return res;
    }

}

spiralgo / algorithms