jsvine / waybackpack

Download the entire Wayback Machine archive for a given URL.
MIT License
2.8k stars 189 forks source link

Waybackpack + matchType #57

Open uy5cu71 opened 1 year ago

uy5cu71 commented 1 year ago

Wayback API had matchType option, example: https://web.archive.org/cdx/search/cdx?url=https://twitter.com/jack/statuses&matchType=prefix

Which returns:

com,twitter)/jack/statuses/"/antarnisti/status/245078986827386880" 20121223123338 https://twicom,twitter)/jack/statuses/"/antarnisti/status/245078986827386880" 20121223123338 https://twitter.com/jack/statuses/%22/Antarnisti/status/245078986827386880%22 text/html 404 VNL4UHLBLX2UYNDIOZZ7ZR3CFYURIVND 5296
com,twitter)/jack/statuses/"/antarnisti/status/245078986827386880" 20130203195805 https://twitter.com/jack/statuses/%22/Antarnisti/status/245078986827386880%22 warc/revisit - VNL4UHLBLX2UYNDIOZZ7ZR3CFYURIVND 1042
com,twitter)/jack/statuses/"/antarnisti/status/245078986827386880" 20130312144230 https://twitter.com/jack/statuses/%22/Antarnisti/status/245078986827386880%22 warc/revisit - VNL4UHLBLX2UYNDIOZZ7ZR3CFYURIVND 1035
com,twitter)/jack/statuses/"/antarnisti/status/245078986827386880" 20130326132131 https://twitter.com/jack/statuses/%22/Antarnisti/status/245078986827386880%22 text/html 404 BMAXRTF3OVX3HL22WUMYLBYT2UJV3HT3 9317
com,twitter)/jack/statuses/"/antarnisti/status/245078986827386880" 20130402123359 https://twitter.com/jack/statuses/%22/Antarnisti/status/245078986827386880%22 warc/revisit - BMAXRTF3OVX3HL22WUMYLBYT2UJV3HT3 1030tter.com/jack/statuses/%22/Antarnisti/status/245078986827386880%22 text/html 404 VNL4UHLBLX2UYNDIOZZ7ZR3CFYURIVND 5296
com,twitter)/jack/statuses/"/antarnisti/status/245078986827386880" 20130203195805 https://twitter.com/jack/statuses/%22/Antarnisti/status/245078986827386880%22 warc/revisit - VNL4UHLBLX2UYNDIOZZ7ZR3CFYURIVND 1042
com,twitter)/jack/statuses/"/antarnisti/status/245078986827386880" 20130312144230 https://twitter.com/jack/statuses/%22/Antarnisti/status/245078986827386880%22 warc/revisit - VNL4UHLBLX2UYNDIOZZ7ZR3CFYURIVND 1035
com,twitter)/jack/statuses/"/antarnisti/status/245078986827386880" 20130326132131 https://twitter.com/jack/statuses/%22/Antarnisti/status/245078986827386880%22 text/html 404 BMAXRTF3OVX3HL22WUMYLBYT2UJV3HT3 9317
com,twitter)/jack/statuses/"/antarnisti/status/245078986827386880" 20130402123359 https://twitter.com/jack/statuses/%22/Antarnisti/status/245078986827386880%22 warc/revisit - BMAXRTF3OVX3HL22WUMYLBYT2UJV3HT3 1030

Is it possible to download all of this urls? Because waybackpack will trim url based on cli input.

I have try to add new matchType parametr to the cdx file, i get valid response, but waybackpack still trim url based on cli input

jsvine commented 1 year ago

Hi @uy5cu71, and thanks for your interest in this library. Unfortunately, I'm not sure I 100% understand your inquiry. But if it helps: waybackpack does not currently support the matchType parameter.