Parse multipart data - Githubissues

48cf commented 6 years ago

Is there any built-in way to parse the multipart data? Or has anyone ever tried doing that? My request headers looks like following:

Accept-Encoding => gzip, deflate
Content-Type => multipart/form-data; boundary=-----------------------------28947758029299
User-Agent => osu!
X-Forwarded-For => 192.168.0.102
X-Forwarded-Proto => https
X-Real-IP => 192.168.0.102
Connection => close
Host => osu.ppy.sh
Content-Length => 84312

As I can see, the Content-Type header contains all data, that I am supposed to parse, and Content-Length is the exactly length of the data. How am I supposed to parse it though? Anybody got any methods that they are using or any ideas? Thanks in advance.

ScottYelich commented 6 years ago

I am working on doing this now... I rolled my own in Swift and it looks like I'll probably end up having to roll my own here as well.

EDIT: 1 week later... due to crow looking like it's not as actively maintained as it had been previously, I decided to use Dlib's HTTP server as I was using Dlib for facial landmark detection anyway. Although there are sources out there to parse multi-part MIME uploaded content -- I wrote my own in C++ based on the same required functionality and learnings from implementing this in Swift (basically get the boundary from the CONTENT_TYPE header, parse POST body to separate header/content for each section, etc. -- the only real catch is to make sure to take into consideration EOL characters.). I also then use nginx in front of the Dlib HTTP service. For JSON I decided to go with https://github.com/nlohmann/json as my service will be bound by the image processing and not the speed of the HTTP server's ability to respond to new/separate connections -- I won't need extreme performance from the network services or data marshalling/unmarshalling functionality. Thank you pierobot (next comment) for bringing me back here to allow me to follow up. Good luck. YMMV.

pierobot commented 6 years ago

https://tools.ietf.org/html/rfc7578 https://stackoverflow.com/a/23517227/7163417

After a quick read looks like

Content-Type boundary contains the string that will separate parts. To get a part, find the start of a boundary and then keep scanning until you reach another boundary; or the end of the body if one isn't found.

Each part has the following possible data

Content-Disposition name holds the name of the input field.
Content-Disposition filename lets you know if it's a file.

The actual boundaries and part data should be in the body of the request.

To get the Content-Type header you'll have to use request.get_header_value("Content-Type");

The-EDev commented 3 years ago

I've just recently done this on my own fork and open a PR here

ipkn / crow

Parse multipart data #310