Open vignesh-arivazhagan opened 5 months ago
Short answer: that server is wrong. %-encoded strings are not allowed in the HTTP Content-Disposition header.
Longer answer:
Although Python's email package is meant to be able to parse HTTP headerscitation as well as email headers, the content you ask it to parse has to (more-or-less) follow the relevant specs.
The url in your example returns a Content-Disposition header that improperly uses (part of) a %-encoded URI as a filename. Here's the raw header in the server's response:
Content-Disposition: inline;filename=annoncement_of%20computer%20application%20_rti_er%20_16122014.pdf;
Nothing in the specs allows % encoding there. RFC 6266 specifies the HTTP Content-Disposition header. In section 4.1, 'filename-parm' is ultimately allowed to have a 'token' or 'quoted-string' value. Those are defined by RFC 2616 section 2.2—skip down to the top of page 17. Nothing there has anything to do with RFC 3986 style % encoding. (A MIME header 'quoted-string' is just in "double quotes"
—it's unrelated to urlparse's quote() function.)
Either there's a bug in www.gsi.gov.in's server software, or (more likely) someone uploaded a file with %20
's already in the name.
(Suggest closing this issue as "not planned.")
Bug report
Bug description:
output
if i use
i am getting unquoted output
why a different unquote function is used in EmailMessage.get_filename() ?
CPython versions tested on:
CPython main branch
Operating systems tested on:
Windows