simonw / conditional-get

CLI tool for fetching data using HTTP conditional get
Apache License 2.0
14 stars 0 forks source link

Add time-based conditionality #3

Open dlanger opened 4 years ago

dlanger commented 4 years ago

conditional-get uses only ETags in its conditional GET requests. There are many servers out there which don't support conditional GETs based on ETags, but do support conditional GETs based off If-Modified-Since headers.

It would be handy if this supported both methods, so when writing a scraper I didn't have to remember which lines should be conditional-get ... and which should be curl... --time-cond....

simonw commented 4 years ago

This is a good idea. I always use ETags when available because I don't trust dates, but there's no reason this tool couldn't fall back on Last-Modified if ETags aren't available.

m040601 commented 3 years ago

I just want to upvote this request.

But first of all, thank you for your work in this really nifty very usefull tool. Really well thought, simple and usefull. Great little addon for the tool box of a command line data scraper.

I ended up here after finding Simon's (as usual interesitng and well written) post on git scrapping.

Although not a programmer, and with very limited python knowledge, I went about it using it in my scraping scripts and testing it with many different web sites.

And soon I also discovered that:

... There are many servers out there which don't support conditional GETs based on ETags, ....

And unfortunatelly, that's, really, "many".

On the servers I tested I also saw a great usage of "Last-Modified", which I suppose could be used for the:

... conditional GETs based off If-Modified-Since headers. ...

and other obscure/complicated http header entries that I dont really understand, but suppose accomplish the same thing.

I also had a read at, https://developer.mozilla.org/en-US/docs/Web/HTTP/Conditional_requests , trying to get familiar with the mechanisms of the thing.

I also understand that the original use case of this tool and it's focus on Etags

I always use ETags when available because I don't trust dates,

So my question is:

Is adding support for these alternatives, something feasible and simple to add to conditional-wget ? Would that involve a lot of extra work for Simon ? Or should I look for other, simple (pythonic or bash/curl), alternatives ? If so could you suggest some ?

Thanks in advance.

simonw commented 2 years ago

Just caught up on these comments and I'm now convinced to is would be a worthwhile feature addition, as a fallback.