madler / zlib

A massively spiffy yet delicately unobtrusive compression library.
http://zlib.net/
Other
5.55k stars 2.43k forks source link

Replacement character � happenining while decompressing data for non english languages #896

Closed login-name closed 8 months ago

login-name commented 8 months ago

Issue: Replacement character � happening in decompression for non english languages Environment:We have agent-server comunication. Agent is installed in laptops and sends data to server. Agent uses C/C++ while server uses Java For non english customers we face replacement characters getting printed during decompression in Java while its working fine in agent side. We have found a fix by implementing version 3 shown in below code. But we are not able to understand why it's fixing the issue. Could you please explain why its working. Below are 3 versions of the decompression code in java using zlib and how it gives output. 1 shows lots of replacement characters 2 elminated replacement character for JSON data while still remaining for XML format 3 elminate replacement character completely Based on searching online all 3 versions of the code looks like valid form of decompression of zlib in java Could you please share if any difference between that i'm understanding incorrectly. Please do DM me privately coconutrange@gmail.com I will share the sample data output for each of these versions since I would like to not share the data publicly. Or let me know your mailID I'll share the data in there

    public static String convertBase64EncodedGziptoString_v1(String str) throws IOException {
        byte[] compressed = DatatypeConverter.parseBase64Binary(str);
        ByteArrayInputStream bais = new ByteArrayInputStream(compressed);
        InflaterInputStream iis = new InflaterInputStream(bais);
        StringBuilder result = new StringBuilder();
        byte[] buf = new byte[4096];
        int rlen;
        while ((rlen = iis.read(buf)) != -1) {
            result.append(new String(Arrays.copyOf(buf, rlen)));
        }
        iis.close();
        bais.close();
        String v1DecompressionOutput = result.toString().trim();
        formattedOutput += "\nv1\n" + v1DecompressionOutput;
        return v1DecompressionOutput;
    }

    public static String convertBase64EncodedGziptoString_v2(String str) throws IOException {
        byte[] compressed = DatatypeConverter.parseBase64Binary(str);
        ByteArrayInputStream bais = new ByteArrayInputStream(compressed);
        Inflater inflater = new Inflater();
        inflater.setInput(compressed);
        InflaterInputStream iis = new InflaterInputStream(bais, inflater);
        byte[] buf = new byte[4096];
        StringBuilder stringBuilder = new StringBuilder();
        int rlen;
        while ((rlen = iis.read(buf)) != -1) {
            stringBuilder.append(new String(Arrays.copyOf(buf, rlen), StandardCharsets.UTF_8));
        }
        iis.close();
        bais.close();
        String v2DecompressionOutput = stringBuilder.toString().trim();
        formattedOutput += "\nv2\n" + v2DecompressionOutput;
        return v2DecompressionOutput;
    }

    public static String convertBase64EncodedGziptoString_v3(String data) throws IOException, DataFormatException {
        byte[] bytes = Base64.getDecoder().decode(data);
        Inflater inflater = new Inflater();
        inflater.setInput(bytes);

        ByteArrayOutputStream outputStream = new ByteArrayOutputStream(bytes.length);
        byte[] buffer = new byte[1024];
        while (!inflater.finished()) {
            int count = inflater.inflate(buffer);
            outputStream.write(buffer, 0, count);
        }
        outputStream.close();
        byte[] output = outputStream.toByteArray();

        String v3DecompressionOutput = new String(output, StandardCharsets.UTF_8);
        formattedOutput += "\nv3\n" + v3DecompressionOutput;
        return v3DecompressionOutput;

    }
login-name commented 8 months ago

Sharing sample screenshot below v1 decompression method showing replacement character in below screenshot image v3 decompression method working fine without showing replacement character in below screenshot though input compressed data is same for both of them image

madler commented 8 months ago

This is not a zlib issue. Try posting your question on stackoverflow.com.