internetarchive / wayback

IA's public Wayback Machine (moved from SourceForge)
722 stars 131 forks source link

Wayback replay mishandles URL with trailing asterisk #80

Open kngenie opened 9 years ago

kngenie commented 9 years ago

For example:

Issue ARI-4272 reports Wayback replay fails as ResourceNotInArchive for URL ending with &* even though there are multiple captures of it.

kngenie commented 9 years ago

Root cause is that EmbeddedCDXServerIndex does not pass explicit matchType to CDXServer, relying on default being exact. CDXServer, however, has a convenience feature that interprets query request for a URL with trailing asterisk as prefix query if matchType is not explicitly set.

Solution is to have EmbeddedCDXServerIndex explicitly pass matchType=exact.

kngenie commented 9 years ago

Fixed by e1fd3f4. Ready for submission.

Correction: e1fd3f4 was merged to branch ait-qa only. Same commit on master: 8578a31 (covers three commits on issue-80 branch in one commit)