kinglcc / juniversalchardet

Automatically exported from code.google.com/p/juniversalchardet
0 stars 0 forks source link

Having this functionality in a stream could be useful #19

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
We store some metadata for the stream contents (like hashes), and we wanted to 
determine the encoding with it as well. I have therefore wrapped the 
UniversalDetector inside a stream to be able to do several actions in one step 
using nested streams.

Maybe it is useful to others:

public class EncodingDetectorInputStream extends BufferedInputStream {

    private final UniversalDetector detector = new UniversalDetector(null);

    public EncodingDetectorInputStream(InputStream in) {
        super(in);
    }

    public String getDetectedCharset() {
        return detector.getDetectedCharset();
    }

    @Override
    public synchronized int read(byte[] b, int off, int len) throws IOException {
        final int nrOfBytesRead = super.read(b, off, len);
        if (!detector.isDone() && nrOfBytesRead > 0) {
            detector.handleData(b, 0, nrOfBytesRead);
        }
        if (nrOfBytesRead == -1) {
            detector.dataEnd();
        }
        return nrOfBytesRead;
    }

}

Original issue reported on code.google.com by guy.mah...@gmail.com on 23 Jul 2013 at 3:32

GoogleCodeExporter commented 8 years ago
the line:
   detector.handleData(b, 0, nrOfBytesRead);

should probably be:
   detector.handleData(b, off, nrOfBytesRead);

Original comment by guy.mah...@gmail.com on 24 Jul 2013 at 5:07