jjjake / internetarchive

A Python and Command-Line Interface to Archive.org
GNU Affero General Public License v3.0
1.62k stars 218 forks source link

Search by URL in Wayback? #78

Closed bitsgalore closed 8 years ago

bitsgalore commented 9 years ago

While trying out the wrapper I couldn't figure out how to search for specific URLS in Wayback. Is this possible at all?

Already tried figuring this out from the achive.org web interface. There this works (using simple search):

http://web.archive.org/web/*/http://www.projectmoonbase.com/

However if I try this using an advanced search:

https://archive.org/search.php?query=%28http%3A%2F%2Fwww.projectmoonbase.com%2F%29

this gives me 0 hits. Also, it's not clear to me what field name I should be using here.

So maybe this is just a limitation of the API of archive.org? Otherwise, if this is somehow possible, it would be helpful to add an example to the documentation, as this looks like a pretty obvious use case. Or maybe I'm just overlooking something obvious myself?

bitsgalore commented 9 years ago

Update to the above: I ended up writing a simple wrapper around the wayback API that pretty much does what I'm looking for. Available here:

https://github.com/bitsgalore/iawayback

This is pretty quick and dirty, but perhaps useful which is why I'm dropping the link here.

jjjake commented 9 years ago

@bitsgalore this is not possible using ia-wrapper at the moment -- but it should be! I'll add this to the todo list. Thanks for the idea!

saper commented 9 years ago

My impression is that The Internet Archive consists of two Independent parts: the WayBack machine that pulls stuff from the Web and the archive where stuff is explicitly pushed. Currently ia-wrapper deals only with the latter I think.

jjjake commented 8 years ago

@bitsgalore I've been working on adding support for ia plugins, and now have something working. I forked your iawayback repo and converted it to an ia plugin. See https://github.com/bitsgalore/iawayback/pull/1 for more details.

As this feature is out of scope for the time being, it might be best implemented as a plugin.