sparklemotion / mechanize

Mechanize is a ruby library that makes automated web interaction easy.
https://www.rubydoc.info/gems/mechanize/
MIT License
4.39k stars 474 forks source link

Split off Mechanize cookie handling? #257

Closed bluemont closed 11 years ago

bluemont commented 12 years ago

I have looked at several Ruby cookie handling libraries:

I have found Mechanize's cookie parsing to work the best in the "real" world. (The CookieJar library does have an interesting approach, though.) I would like to help split out Mechanize's cookie handling into a separate gem. Thoughts? Having the cookie logic separate might get more projects to rely on it. Cookies, in my experience, are a pain at a low level and this would slightly reduce the pain.

The reason I got to thinking about this... I'm using Sidekiq and only need Mechanize for its cookie handling. Though I think using the cookie parts of Mechanize is thread-safe based on a code read, I would feel better having a dependency that is totally thread-safe.

knu commented 12 years ago

As the maintainer of the cookie related code in Mechanize, I'd support the idea. I worked hard to make it RFC compliant and modern browser compatible.

The gem name may be http-cookie with the name space HTTP::Cookie (assuming it's available). Do you already have an early work? I'd be happy to review and take it over.

tenderlove commented 12 years ago

I think it's a good idea. @drbrain how about you?

drbrain commented 12 years ago

So long as I am not the primary maintainer, OK. I think I am too busy to be a good maintainer.

How about we put it in the sparklemotion organization and move mechanize there too?

flavorjones commented 12 years ago

+1 for the move to sparklemotion.

bluemont commented 12 years ago

Thanks.

@knu I split the Mechanize Cookie code into my app -- it was easy for my use -- but I haven't started a dedicated gem yet. Note: there was an in-method require for yaml that I removed, since it can impact thread safety.

Questions: what platforms do we need to support? 1.8? 1.9? JRuby? Rubinius?

flavorjones commented 12 years ago

@drbrain and @knu - I've added you both as owners of sparklemotion, so you should be able to add or move projects to it as you please.

drbrain commented 12 years ago

Mechanize supports Ruby 1.8.7, 1.9.2 and newer, I can't imagine anything in the cookie support code that would be improved by making it ruby 1.9+.

I've made no particular effort to support or hinder Mechanize on JRuby or Rubinius in Mechanize.

There are some issues with JRuby over Tempfile for Mechanize that will be fixed in the next release. I believe running the tests on JRuby results in several failures. I haven't investigated these, but I don't think they're related to cookie handling.

knu commented 12 years ago

@drbrain and @flavorjones: I agree about moving. Do it whenever you like.

I'll start working (locally) on separating a http-cookie gem from mechanize soon.

flavorjones commented 12 years ago

@tenderlove - It's up to you (as tenderlove/mechanize owner) to move the repo to sparklemotion.

Wait for it ...

I'm starting to doubt your commitment to Sparklemotion.

tenderlove commented 12 years ago

omg, it's moved!

leejarvis commented 11 years ago

@knu How's this going? Are we pretty much ready to push a http-cookie gem and use that in mechanize?

knu commented 11 years ago

The http-cookie gem is supposed to be almost ready but I've lost my to-do list, so it'll take a few days before I can recall the details and get back to work on it. As far as I can tell, the basic API is almost complete and ready for a review by the authors of consumers, i.e. HTTP libraries.

One concern is that I've dropped backwards compatibility in some points, namely, serialization format(s) and some methods, and I'm going to not provide a compatibility layer. Mechanize::Cookie and CookieJar will be gone. So I'd like to adopt HTTP::Cookie when there's going to be a major version bump.

knu commented 11 years ago

I've finished making and pushing all the major improvements and changes to sparklemotion/http-cookie and released the gem version 1.0.0.pre1.

The mechanize library side changes are complete in the http-cookie branch and ready to merge.

knu commented 11 years ago

There are several API incompatibilities (besides the class names) as described here, and there is no compatibility layer provided, but just errors as follows:

And after replacing the class names:

knu commented 11 years ago

My PoC code of a compatibility layer is put as a comment in mechanize/cookie and mechanize/cookie_jar. I'm still considering how it should be provided. Probably they should be injected from the Mechanize side to keep http-cookie clean and legacy free.

leejarvis commented 11 years ago

@knu I'm happy with this, including the compatibility layer. We should define a version we plan on removing this layer and include this in the deprecation messages. I'm happy to include these changes in the next minor version release (2.7) and perhaps remove the compatibility layer in 2.8 or 3.x. Can we get the http-cookie branch working with these changes with tests passing?

knu commented 11 years ago

I've pushed almost everything necessary to both http-cookie and mechanize (the http-cookie branch) with regard to compatibility. Compatibility warnings have been removed on the http-cookie side, and compatibility wrappers have been implemented on the mechanize side instead.

There remains just one significant incompatibility: YAML deserialization. I'll work on tweaking CookieJar#load so it can load existing YAML files in the next few days as well as releasing http-cookie 1.0.0 final. I think I'm almost there.

knu commented 11 years ago

OK, now #load can load YAML files in the old format. On the other hand, I'm not going to tweak #save_as so it saves a file in the old format because it will be just too messy to achieve and delay transition. The old format is neither efficient or secure because eTLD rules periodically updated in the DomainName gem will never get reflected to already saved cookies.

So, we'll have to tell users that they should upgrade mechanize on all instances at once or they may lose their YAML jar.

drbrain commented 11 years ago

Can we change the name of the YAML jar to avoid data loss?

knu commented 11 years ago

Like.. YML?

drbrain commented 11 years ago

.v2.yaml ?

drbrain commented 11 years ago

… oh, that won't work, how about .v2.yml?

knu commented 11 years ago

Any name will work. File names are given by the caller along with the format specifier, and the library does not care about file extensions or provide any default extension at all.

knu commented 11 years ago

I've tweaked both Mechanize and HTTP::Cookie with a gross hack (gsub on YAML!) so that:

These points make Mechanize backward compatible and forward compatible at the same time.

I think we can now merge the current http-cookie branch into master to get the next major version (2.7) of Mechanize a migration base. When most users have upgraded to 2.7, we can change the save format to the new one, and then remove support for reading the old format!

knu commented 11 years ago

I've rebased the http-cookie branch with additional changes and API adjustments.

Good news is that the stock tests have passed with least changes, which are almost all related to parser changes and they are about corner cases where RFC 6265 conflicts with the legacy style.

I'm ready to merge it as soon as I roll out http-cookie 1.0.0 final.