rivermont / spidy

The simple, easy to use command line web crawler.
GNU General Public License v3.0
334 stars 69 forks source link

HEAD Request uses default Requests headers #48

Closed rivermont closed 6 years ago

rivermont commented 6 years ago

The initial HEAD request sent to get the document's size is not using the set HTTP headers that the GET request is. Should be a simple fix.

ayushkalani commented 6 years ago

I'm interested in doing it

ayushkalani commented 6 years ago

Can you please tell me the function and the file where to make these changes and please elaborate

rivermont commented 6 years ago

See the crawl() function in crawler.py.

Double-A-92 commented 6 years ago

@ayushkalani In https://github.com/rivermont/spidy/blob/master/spidy/crawler.py compare line 100 with line 107.

AwesomeMarioFan commented 6 years ago

I've opened a pull request for this here: https://github.com/rivermont/spidy/issues/48

realazizk commented 6 years ago

Ah fine, I'm a little late I'll work on something else.