jarib / celerity

This project is no longer maintained.
http://celerity.rubyforge.org/
GNU General Public License v2.0
206 stars 38 forks source link

Update to latest HtmlUnit snapshot to fix URL encoding bug #19

Closed wincent closed 14 years ago

wincent commented 14 years ago

I originally posted to the mailing list here:

http://rubyforge.org/pipermail/celerity-users/2010-June/000394.html

describing a problem with the "goto()" method wherein special characters in the URL were getting mangled (double-encoded) and thus producing spurious 404 errors.

Doing by hand in "jirb" what Celerity is doing behind the scenes, I was able to show that it is the version of HtmlUnit that ships with Celerity which is responsible for the double-escaping:

$ jirb -r rubygems -r celerity
> url = Java::JavaNet::URL.new 'http://localhost:3000/wiki/has_%3Cstrange%3E_stuff'
> request = HtmlUnit::WebRequestSettings.new url
> client = HtmlUnit::WebClient.new HtmlUnit::BrowserVersion::FIREFOX_3
> client.getPage request

In the server logs, the request is being sent to "/wiki/has_%25253Cstrange%25253E_stuff", which doesn't exist. The "%" is becoming "%25", which is in turn being escaped again and becoming "%2525".

Searching on the HtmlUnit tracker I found a number of bugs related to this double-encoding.

That last one suggests that the bug is fixed in the latest HtmlUnit snapshot available from here:

http://build.canoo.com/htmlunit/artifacts

So as a quick-and-dirty test, I replaced the version of HtmlUnit that ships with Celerity in my local install with the latest snapshot:

$ cd /usr/local/jruby/lib/ruby/gems/1.8/gems/celerity-0.7.9/lib/celerity/htmlunit
$ sudo rm *.jar
$ sudo curl -O http://build.canoo.com/htmlunit/artifacts/htmlunit-2.8-SNAPSHOT-with-dependencies.zip
$ sudo unzip htmlunit-2.8-SNAPSHOT-with-dependencies.zip 
$ sudo mv htmlunit-2.8-SNAPSHOT/lib/* .

And then repeated my original tests; the URL double-encoding bug is now fixed.

So, that's what this ticket is about: a request to update to the latest snapshot of HtmlUnit so that URLs with "special" characters in them can be accessed.

Unfortunately there is no workaround other than updating HtmlUnit. Passing in an unencoded URL (ie. with literal "<" and ">" characters) doesn't work because it still ends up getting double-encoded (ie. as "/wiki/has_%253Cstrange%253E_stuff") which will also produce spurious 404 errors.

Cheers, Wincent

jarib commented 14 years ago

Yes, this should resolve itself as soon as we're able to do a release with the snapshots currently in HEAD. There are other issues with the snapshots though (which you'll see if you run the specs).

wincent commented 14 years ago

Pushed some things to the "ticket19" branches of my forked repos:

First up, found that the specs in browser_spec.rb were hanging indefinitely, so I tried running individual spec files to see if the other specs passed and found that some of them couldn't be run individually. This commit fixes that:

http://github.com/wincent/celerity/commit/a31f0c383a3b9904bef898a85b3c3a62edef638c

I'm also seeing 1 failure in watir_compatibility_spec.rb, and 1 failure in watirspec/button_spec.rb. Both failures show errors like this in the console:

[2010-06-22 15:23:29] ERROR ArgumentError: negative length -35 given

Not sure why the browser_spec.rb specs are hanging, anyway. Basically hangs after about 10 examples with the last thing being visible this:

[2010-06-22 15:19:47] INFO  going to shutdown ...
[2010-06-22 15:19:47] INFO  WEBrick::HTTPProxyServer#start done.

If I mark the hanging example as "pending", the next one hangs, and so on...

In any case, added a spec for the bug under discussion in this ticket:

http://github.com/wincent/celerity/commit/b4032fc2b12398264902735a3f7e8216cc3ac62c

Note that this depends on a change in the submodule:

http://github.com/wincent/watirspec/commit/050d6f8084fe9ac34619637ca3e8cf699142cd26

I did not bother updating the superproject because the commit hash will most likely be different when/if you merge it in anyway, so I'll leave that up to you.

jarib commented 14 years ago

Thanks, I've pulled in your changes.

The hanging spec is known, and is caused by the previous "huge page" specs. HtmlUnit recently introduced a new page type that needs to be dealt with - I'm working on it :)

I'm unable to reproduce the failures in watir_compatibility_spec.rb and watirspec/button_spec.rb. Could you open a new ticket for those with the full backtrace + info about your JRuby environment (OS, JDK etc)?

wincent commented 14 years ago

Opened a ticket with the info for those other spec failures:

http://github.com/jarib/celerity/issues/issue/20

Cheers, Wincent