tentwentyfour / nextcloud-link

Javascript/Typescript client that communicates with Nextcloud's WebDAV and OCS APIs
MIT License
56 stars 7 forks source link

Buffer string from `get` method isn't valid a document #39

Open maximelafarie opened 2 years ago

maximelafarie commented 2 years ago

I have some PDF files on NextCloud. Creating folders, adding files into them works well. But when it's time to retrieve theses files with the get method, the Buffer string returned by the method, even if converted in real Buffer, base64 or Blob can't be read by PDF utilities like PDFTron (loadDocument method).

The original pdf file is this one but is converted later by PDFTron (the file you can download below with a uuid name).

You can check the content returned by the get method here: dummy-pdf-buffercontent.txt

It is supposed to be this file: 2d372fe6-69f5-48fe-ba3b-03651aeef2f1 (1).pdf.

I don't know if it's library-related or just a personal implementation issue.

Cheers! 🖖

maximelafarie commented 2 years ago

In addition to my previous message, I tried something with getReadStream:

Here are the two tests:

await this.checkConnectivity();

/* FIRST TEST (NOT WORKING) */
const test = await this.client.get(path); // C.f. my previous message
const test64 = btoa(
    new Uint8Array(Buffer.from(test, 'ascii')) // <-- I'm trying to convert the string returned by `get` in a Buffer
        .reduce((data, byte) => data + String.fromCharCode(byte), '')
);

console.log('TEST64', test64); // => THIS RESULTS IN A BROKEN PDF FILE
/* END FIRST TEST */

/* SECOND TEST (WORKING) */
const { uri, headers } = await this.client.getReadStream(path) as any; // New try with this method
const prom = new Promise(async (resolve, reject) => {

    try {

        const file: AxiosResponse<Buffer> = await axios.get(uri.href, {
            responseType: 'arraybuffer',
            headers: { 'Authorization': headers.authorization }
        });

        const base64 = btoa(
            new Uint8Array(file.data)
                .reduce((data, byte) => data + String.fromCharCode(byte), '')
        );

        console.log('base64', base64); // => THIS RESULTS IN A VALID PDF FILE

        resolve(file.data);

    } catch (error) {
        reject(error);
    }
/* END SECOND TEST */

});

To try if the base64 can be converted in a valid PDF file, I'm trying it here.

lattam commented 2 years ago

Hello and thank you for you report. The issue seems to concern only binary files, the text files work well.

I found the source of the issue in the npm-WebDAV-Client's implementation of get function: https://github.com/OpenMarshal/npm-WebDAV-Client/blob/master/src/index.ts#L293 the options passed to the request should contain encoding: null in case of binary file as mentioned in the request's documentation:

https://github.com/request/request#requestoptions-callback encoding - encoding to be used on setEncoding of response data. If null, the body is returned as a Buffer. Anything else (including the default value of undefined) will be passed as the encoding parameter to toString() (meaning this is effectively utf8 by default). (Note: if you expect binary data, you should set encoding: null.)

It works with the encoding manually set to null in the source code.

But I'm not sure how to handle this, since we cannot pass any options object into the webdav's get function.

lattam commented 2 years ago

I guess we will have to increase priority of this https://github.com/tentwentyfour/nextcloud-link/issues/26. The temporary workaround is to use streams as you suggested.

Also using downloadToStream could help a bit:

await client.downloadToStream(nextcloudPath, Fs.createWriteStream('/some/path'));
maximelafarie commented 2 years ago

Hi @lattam, thank you very much for your detailed investigation about this issue. Indeed, #26 seems to be the next "big" step for this library but it looks inevitable to think about release it to gain sovereignty and performance.

Also using downloadToStream could help a bit: [...]

In your opinion, which one is the more performant method between getReadStream(the one I'm currently using) and downloadToStream as you suggested?

🖖