symfony / symfony

The Symfony PHP framework
https://symfony.com
MIT License
29.64k stars 9.43k forks source link

[Mime] The multipart/form-data generated by FormDataPart is not well recognized #37500

Closed wiser closed 4 years ago

wiser commented 4 years ago

Symfony version(s) affected: 4.4.10

Description
I try to upload some data (regular field and/or file field) with the Http Client and so I use the Symfony Mime component to produce the request body. With some APIs it works but not with all... So I've lean over this behaviour looking for explanation and I've discover that when the name of the field does not contain spaces or other "strange chars", it is not enclosed with quotes. I use several private APIs from market products (like Marklogic, Dalim Twist) which does not understand multipart form requests without quotes around fields names, even if these fields has simple names as 'field'...

How to reproduce

use Symfony\Component\Mime\Part\DataPart;
use Symfony\Component\Mime\Part\Multipart\FormDataPart;

$formFields = [
    'regular_field' => 'some value',
    'application/xml' => 'some other value'
];
$formData = new FormDataPart($formFields);

dump($formData->bodyToString());

If you check the generated body you can see that the "regular_field" is not enclosed by quotes : Content-Disposition: form-data; name=regular_field But the other field with the name containing a slash appears with quotes : Content-Disposition: form-data; name="application/xml"

Possible Solution
I can’t change fields names of requests because APIs I call does not belong to me. So my first idea was to preg_replace generated body to add the quotes but some of them contains files and their length can be very high. So I've check the code in Symfony which produces the body and all the magic is here :

namespace Symfony\Component\Mime\Header\ParameterizedHeader::getEndOfParameterValue() in line 161.

const TOKEN_REGEX = '(?:[\x21\x23-\x27\x2A\x2B\x2D\x2E\x30-\x39\x41-\x5A\x5E-\x7E]+)';
if (!preg_match('/^'.self::TOKEN_REGEX.'$/D', $value)) {
    $value = '"'.$value.'"';
}

Can you please point me a way to manage this issue ? Can I override this behaviour with some tricks ? Maybe this library can be changed to always add quotes arround values ? My last chance is to change the whole HTTP library but according to me the Symfony one is the more convenient to use.

Thank you for your help. BR. Wiser

nicolas-grekas commented 4 years ago

Same as #36738 or #35443 ?

wiser commented 4 years ago

Same as #36738 or #35443 ?

I suppose but I never can get more informations from issue creators :/ By the way I am pretty sure this StackOverflow question is also related : https://stackoverflow.com/questions/57908599/how-to-use-symfony-httpclient-post-data As you can read in the last comment, the guy is using the same private API as I am : [...]my endpoint (a Twist server in reality)[...] If I could find a public API with the same problem it would be easier for me to be understood, but for now I couldn't find it. Thank you @nicolas-grekas for your precious help.

wiser commented 4 years ago

@xabbuh this issue is "Waiting feedback" do I need to provide more informations ? I can suggest a PR and remove the described behaviour but I am not sure the team is waiting for a such solution...

stof commented 4 years ago

The RFC 7578 defining multipart/form-data specifies that the Content-Disposition header respects RFC 2183, which contains this text:

NOTE ON PARAMETER VALUE LENGHTS: A short (length <= 78 characters) parameter value containing only non-tspecials characters SHOULD be represented as a single token. A short parameter value containing only ASCII characters, but including tspecials characters, SHOULD be represented as quoted-string. Parameter values longer than 78 characters, or which contain non-ASCII characters, MUST be encoded as specified in [RFC 2184].

Extension-token, parameter, tspecials and value are defined according to [RFC 2045] (which references [RFC 822] in the definition of some of these tokens). quoted-string and DIGIT are defined in [RFC 822].

So while we could always apply quoting (that would be a matter of removing the TOKEN_REGEX condition in favor of always doing the logic), we would be less spec-compliant for such case. If some specific servers are known to be non-compliant, this should ideally be reported to them

stof commented 4 years ago

@nicolas-grekas aren't we missing the escaping of special chars in values here btw, to be spec compliant (" and \ require escaping in RFC 822 quoted-string)

wiser commented 4 years ago

@stof I have read carefully the RFC and according to me the "name" part has not to be compliant to the RFC 2183 token definition. The RFC 2183 defines the multipart/form-data but it does not define the specific "name" part in this original definition. You can check all the document you won't find any reference of this keyword. Actually its definition only stands in the RFC 7578 with this text :

Each part MUST contain a Content-Disposition header field [RFC2183] where the disposition type is "form-data". The Content-Disposition header field MUST also contain an additional parameter of "name"; the value of the "name" parameter is the original field name from the form (possibly encoded; see Section 5.1). For example, a part might contain a header field such as the following, with the body of the part containing the form data of the "user" field:

       Content-Disposition: form-data; name="user"

Take a look at the provided example, the value of the name part contains quotes. If you read further, you will find another examples where the name value is always surrounded by quotes like this one : content-disposition: form-data; name="field1"

I hope we can open a discussion about this precise point because like I wrote earlier, I only have this issue with the Symfony Mime Component (I have tried others HTTP client implemantations to understand) but of course it is the more convenient to use.

fabpot commented 4 years ago

A pragmatic solution seems to always quote the names. And I don't see any issue with that. The SHOULD definition seems to fit perfectly here: "mean that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course." This seems a valid reason: always quoting works both for emails and HTTP requests.

See #37579

fabpot commented 4 years ago

I've closed my PR as it breaks another fix we did.

fabpot commented 4 years ago

In #35443, @n3o77 found that removing the name from Content-Type was enough to make it work. Can you test #37581 to see if that's also enough for you?

wiser commented 4 years ago

@fabpot I have tried the #37581 with the Marklogic server and it still does not recognize the request. To solve the issue I really have to quote the name value.

Can I do something to help ?

fabpot commented 4 years ago

@wiser Can you test the new version of #37581 again?

wiser commented 4 years ago

@fabpot Yes, this time the request is accepted by the two servers Marklogic and Dalim Twist. Thanks to the team for listening to me and taking my issue as you've done.

Now I have to check why the transfer is chunked when my server does not support it but this is another story. This issue can be closed if you are ok to merge #37581