Open dlanger opened 4 years ago
This is a good idea. I always use ETags when available because I don't trust dates, but there's no reason this tool couldn't fall back on Last-Modified if ETags aren't available.
I just want to upvote this request.
But first of all, thank you for your work in this really nifty very usefull tool. Really well thought, simple and usefull. Great little addon for the tool box of a command line data scraper.
I ended up here after finding Simon's (as usual interesitng and well written) post on git scrapping.
Although not a programmer, and with very limited python knowledge, I went about it using it in my scraping scripts and testing it with many different web sites.
And soon I also discovered that:
... There are many servers out there which don't support conditional GETs based on ETags, ....
And unfortunatelly, that's, really, "many".
On the servers I tested I also saw a great usage of "Last-Modified", which I suppose could be used for the:
... conditional GETs based off If-Modified-Since headers. ...
and other obscure/complicated http header entries that I dont really understand, but suppose accomplish the same thing.
I also had a read at, https://developer.mozilla.org/en-US/docs/Web/HTTP/Conditional_requests , trying to get familiar with the mechanisms of the thing.
I also understand that the original use case of this tool and it's focus on Etags
I always use ETags when available because I don't trust dates,
So my question is:
Is adding support for these alternatives, something feasible and simple to add to conditional-wget ? Would that involve a lot of extra work for Simon ? Or should I look for other, simple (pythonic or bash/curl), alternatives ? If so could you suggest some ?
Thanks in advance.
Just caught up on these comments and I'm now convinced to is would be a worthwhile feature addition, as a fallback.
conditional-get
uses only ETags in its conditionalGET
requests. There are many servers out there which don't support conditionalGET
s based on ETags, but do support conditionalGET
s based offIf-Modified-Since
headers.It would be handy if this supported both methods, so when writing a scraper I didn't have to remember which lines should be
conditional-get ...
and which should becurl... --time-cond...
.