perl-catalyst / catalyst-runtime

The Elegant MVC Web Application Framework
266 stars 107 forks source link

Default charset disappeared in MetaCPAN (cpan-api) after upgrade from 5.90053 to 5.90103 #132

Closed oalders closed 8 years ago

oalders commented 8 years ago

I'm gearing up for the QA Hackathon and I'm just trying to upgrade the API side of MetaCPAN to the latest Catalyst, but I've got some failing tests which show:

        #   Failed test 'Content-type'
        #   at t/server/controller/source.t line 79.
        #          got: 'text/plain'
        #     expected: 'text/plain; charset=UTF-8'
        ok 8 - Change-log content

Build is at https://travis-ci.org/CPAN-API/cpan-api/jobs/113828722#L3061

I'm wondering what happened to the charset. @shadowcat-mst suggested I open an issue here to see if anyone could help.

Thanks very much!

perl-catalyst-sync commented 8 years ago

Starting with catalyst v 5.90080 (release about two years ago) UTF8 encoding is on by default. There's details in the changes file as well as the delta file. Also there's a new UTF8 pod document and I did a number of advent articles I think in either 2013 or 2014 advent. I'm remote but look for those docs and shout out if you have trouble.

So you can fix the test or explicitly turn off UTF8 encoding whichever is the better choice for you.

On Friday, March 4, 2016, Olaf Alders notifications@github.com wrote:

I'm gearing up for the QA Hackathon and I'm just trying to upgrade the API side of MetaCPAN to the latest Catalyst, but I've got some failing tests which show:

    #   Failed test 'Content-type'
    #   at t/server/controller/source.t line 79.
    #          got: 'text/plain'
    #     expected: 'text/plain; charset=UTF-8'
    ok 8 - Change-log content

Build is at https://travis-ci.org/CPAN-API/cpan-api/jobs/113828722#L3061

I'm wondering what happened to the charset. @shadowcat-mst https://github.com/shadowcat-mst suggested I open an issue here to see if anyone could help.

Thanks very much!

— Reply to this email directly or view it on GitHub https://github.com/perl-catalyst/catalyst-runtime/issues/132.

oalders commented 8 years ago

I did see the Changelog entries before I opened this ticket, but it wasn't clear to my why the charset has been removed from the 'Content-Type' header. This was intentional?

jjn1056 commented 8 years ago

The issue stems from this line of code:

https://github.com/CPAN-API/cpan-api/blob/master/lib/MetaCPAN/Server/Controller/Source.pm#L109

    $c->res->content_type('text/plain');
    $c->res->body( $file->openr );

Here you are setting both the body and content type manually, and the body is set to a file handle. When we did the UTF-8 core work, I concluded that in the case where someone is streaming a file handle I can't really apply a UTF-8 encoding automatically, so we said if you do this type of body response you must do encoding yourself:

https://metacpan.org/pod/distribution/Catalyst-Runtime/lib/Catalyst/UTF8.pod#Encoding-with-streaming-type-responses

"From In this example we create a filehandle to a text file that contains UTF8 encoded characters. We pass this down without modification, which I think is correct since we don't want to double encode. However this may change in a future development release so please be sure to double check the current docs and changelog. Its possible a future release will require you to to set a encoding on the IO layer level so that we can be sure to properly encode at body finalization. "

In this type of response (a file handle) I was not able to determine how (or if) Catalyst should attempt to apply an encoding. I defaulted to 'don't do it' and documented that way. Its possible we might want to add a compatibility fudge here to say, "... unless someone set App->encoding, in which case set the encoding header, and assume the user is doing the right thing."

I'm not sure the right thing since to me this code is suspicious. You are asserting UTF-8 yet we are doing doing to know that is the case or not. We probably are just getting away with it since all this stuff is ASCII probably and that's forward compatible mostly with UTF-8.

If you just want to keep the old behavior you should in this controller assert the UTF-8 encoding yourself. Again no idea if the file handle is actually returning UTF-8 encoded byte stream. But that would make your test pass.

BTW I noticed a number of things that you could update in Server.pm, a number of your plugins have been moved to core. You might want to review and see if you can simplify your dependency chain.

oalders commented 8 years ago

@jjn1056 Thanks very much for this thorough explanation. Based on this, I'm just going to fix the tests not to check for the charset in the header.

We'll have someone review the Server.pm as well, to see what can be simplified. I appreciate your help on this. This has unblocked the Catalyst upgrade for me.