jupyter / notebook

Jupyter Interactive Notebook
https://jupyter-notebook.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
11.8k stars 5.01k forks source link

Uploading a large file through contents API #5705

Open rikukissa opened 4 years ago

rikukissa commented 4 years ago

Hey! I'm trying to upload a large file (let's say 5GB) through contents API.

The problem I'm facing is that as the API endpoint only accepts a JSON payload with a "content": string key, it is pretty difficult to upload a file this big.

The problem is that for creating the JSON payload, I would have to read the file contents into memory first. I've tried also using a stream as the payload and streaming the data into the request, but Jupyter quickly returns an HTTP 413 Payload Too Large.

What would be your recommended way of getting a file this big inside the notebook workspace?

The way I'd expect the file upload endpoint to work is that it would accept an application/x-www-form-urlencoded payload with the file data inside of it. Allowing data in this format would also enable the API endpoint to be used with a standard HTML form.

Happy to help with a PR if you find this way of uploading large files worth it.

Zsailer commented 4 years ago

Have you looked at using the "chunk" field in the model you're passing to the contents API?

This allows you to break up a large file into chunks and save each chunk one at a time. That way, you don't have to load the entire file first and send one large JSON blog to the server.