yokostan / Leetcode-Solutions

Doing exercise on Leetcode. Carry on!
0 stars 3 forks source link

Leetcode #187 Repeated DNA Sequences #133

Open yokostan opened 5 years ago

yokostan commented 5 years ago
class Solution {
    public List<String> findRepeatedDnaSequences(String s) {
        List<String> res = new ArrayList<>();
        HashMap<String, Integer> map = new HashMap<>();

        if (s == null || s.length() <= 10) {
            return res;
        }

        char[] seq = s.toCharArray();
        boolean[] visited = new boolean[seq.length];

        for (int i = 0; i <= seq.length - 10; i++) {
            StringBuilder sb = new StringBuilder();
            for (int j = i; j < i + 10; j++) {
                sb.append(seq[j]);
            }
            String str = sb.toString();
            if (map.containsKey(str)) {
                if (visited[map.get(str)] == false) {
                    res.add(str);
                    visited[map.get(str)] = true;
                }
            }
            else {
                map.put(str, i);
            }
        }

        return res;
    }
}

My initial solution, as above, was accepted, but I am gonna optimize it.

class Solution {
    public List<String> findRepeatedDnaSequences(String s) {
        List<String> res = new ArrayList<>();
        HashMap<String, Integer> map = new HashMap<>();
        HashSet<String> temp = new HashSet<>();

        if (s == null || s.length() <= 10) {
            return res;
        }

        char[] seq = s.toCharArray();

        for (int i = 0; i <= seq.length - 10; i++) {
            StringBuilder sb = new StringBuilder();
            for (int j = i; j < i + 10; j++) {
                sb.append(seq[j]);
            }
            String str = sb.toString();
            if (map.containsKey(str)) {
                temp.add(str);
            }
            else {
                map.put(str, i);
            }
        }

        for (String string : temp) {
            res.add(string);
        }

        return res;
    }
}

But this one is as slow as last one... Well, the key to reach beating 65% is to substitute the i from start to start + 10 with String str = s.substring(i, i + 10)... Should have thought about that.

Here comes the ultimate genius solution:

class Solution {
    public List<String> findRepeatedDnaSequences(String s) {
        char cs[] = s.toCharArray();
        int l = cs.length;
        if(l<=10){
            return new ArrayList<>();
        }
        byte d[] = new byte[85];

        d['C'] = 1;
        d['G'] = 2;
        d['T'] = 3;

        BitSet st = new BitSet(1048576);
        int v = 0;
        for(int i=0;i<10;++i){
            v = (v << 2)|d[cs[i]];
        }
        st.set(v);
        BitSet added = new BitSet(1048576);
        List<String> sta = new ArrayList<>(l>>1);
        for(int i=10; i< l ; ++i ){
            v = ((v << 2)&0xFFFFF)|d[cs[i]];     
            if(st.get(v)){
                if(!added.get(v)){
                    sta.add(new String(cs,i-9,10));
                    added.set(v);
                }
            }else {
                st.set(v);
            }
        }
        return sta;
    }
}

which uses data structure like BitSet and byte[] that I even haven't heard of...