libwww-perl / WWW-Mechanize

Handy web browsing in a Perl object
https://metacpan.org/pod/WWW::Mechanize
Other
68 stars 53 forks source link

WM: Pass response charset down to HTML::Form objects #225

Open spazm opened 7 years ago

spazm commented 7 years ago

kapran...@gmail.com reported on Jun 10, 2009

What steps will reproduce the problem?
1. Request a page with unusual charset.
2. Fill some input fields with non-ascii characters.
3. Submit the form.

What is the expected output? What do you see instead?

Generated HTTP request will either fail because HTTP::Message will choke on
Unicode string or always use UTF-8 encoding depending on freshness of
HTML::Form.

But it should use the charset of the page from which the form was parsed.
It is known only to the user agent.

Please provide any additional information below.

Our patch adds support for newer HTML::Form with accept_charset() method.
See
http://github.com/gisle/libwww-perl/tree/ff583c4b194eb7437c71f6fb659ae03b9bffce70

There're tests and POD. Compatibility with old HTML::Form is not broken.

Details

Imported from Google Code issue 101 via archive

Comments

petda...@gmail.com commented on Jul 6, 2009 :

(No comment was entered for this change.)

gvera...@gmail.com commented on Jul 15, 2009 :

I follow your guidelines in your charset-patch in version Mechanize-1.58 and I try to
make test the module. I have this error: Can't locate object method "charset" via
package "HTTP::Headers" at /usr/local/share/perl/5.10.0/HTTP/Message.pm line 627.
Do yoy have any explanation ?

thanks,
George Veranis

Martin.v...@gmx.net commented on Aug 18, 2009 :

I'm not sure that accept_charset is the way to go. People might expect that setting
to correspond to the accept-charset parameter of the HTML form tag.
http://www.w3.org/TR/1999/REC-html401-19991224/interact/forms.html#adef-accept-charset

Instead, it might be better to pass the charset to HTML::Form->parse which then
assigns those to the default_charset entry in the Form hash. It's rather new, though:
http://github.com/gisle/libwww-perl/commit/f13b3181f9d0140d83313233b5cbf0cb7ce4ee02
Included since libwww-perl 5.831. Looking at the timing I wonder whether that feature
was added in response to this report here...?

I guess a simple "can" check for this feature in HTML::Form won't work, so you'd have
to use the version number to determine the presence of this feature. I also think
that using accept-charset as a fallback might be desirable, but as I tend to use
bleeding-edge versions, I don't care overly much.

hira.tara@gmail.com commented on Oct 20, 2010 :

> But it should use the charset of the page from which the form was parsed.

I face the same problem.

I agree with Martin.v...@gmx.net and made the new patch.
This patch works well with both libwww-perl-5.827 (in github) and 5.837.

How about this patch?

petda...@gmail.com commented on Oct 21, 2010 :

This discussion needs to happen on the mailing list so the public can easily see it.

Thanks,
Andy

hira.tara@gmail.com commented on Oct 21, 2010 :

OK, I'll try to post it on WWW::Mechanize users ML. Thank you for your response.

Martin.v...@gmx.net commented on Oct 22, 2010 :

What mailing list would that be? libwww@p.o? Looking at its archive at http://dir.gmane.org/gmane.comp.lang.perl.modules.lwp there seems to be pretty little activity there recently. http://lists.cpan.org/showlist.cgi?name=libwww as linked from p/www-mechanize/ seems to be down at least today. http://lists.perl.org/list/libwww-perl.html has almost no information, in particular no archives at all, so none that could provide public access to more recent posts either, if there are such posts.

I'd like to be able to follow this discussion even without subscribing to a mailing list. Thats the reason I like bug trackers: I can subscribe to those issues that affect me, and don't have to filter out those that don't. So please keep people subscribed to this issue informed as well.

hira.tara@gmail.com commented on Oct 22, 2010 :

Hmm, I sent the message to following ML.

http://groups.google.com/group/www-mechanize-users/

I'm a new member of this ML and waiting for my message to be moderated.

colossus...@gmail.com commented on Aug 13, 2011 :

Wouldn't passing HTML::Form->parse the HTTP::Reponse itself solve this issue? Also, it would have the added benefit of minimizing memory usage a bit because it would in effect be passing the html by reference instead of creating another copy of it.