rajatomar788 / pywebcopy

Locally saves webpages to your hard disk with images, css, js & links as is.
https://rajatomar788.github.io/pywebcopy/
Other
527 stars 106 forks source link

how to use cookies? #30

Closed marshonhuckleberry closed 4 years ago

marshonhuckleberry commented 4 years ago

how to use cookies same as a browser would do, the cookies are saved automaticaly and no manual copy needed?

marshonhuckleberry commented 4 years ago

forgot to say, i use python 3.6

rajatomar788 commented 4 years ago

The pywebcopy uses a inbuilt requests library session which stores the headers and cookies and all other data during all the requests. You can customize this session before start to modify the cookies or headers etc.

import pywebcopy 

pywebcopy.SESSION.cookies['cookie_key'] = 'some_id'

pywebcopy.config.setup_config('http://google.com',  '/path/to/downloads/', debug=True)
pywebcopy.save_webpage('http://google.com')

You can learn more about manipulation of requests session from documentation.

https://www.dev2qa.com/how-to-get-set-http-headers-cookies-and-manage-sessions-use-python-requests-module/

marshonhuckleberry commented 4 years ago

what is this part of the code doing? pywebcopy.config.setup_config('http://google.com', '/path/to/downloads/', debug=True)

marshonhuckleberry commented 4 years ago

it looks like its the config but im confused why use url two times? pywebcopy.config.setup_config('http://google.com' pywebcopy.save_webpage('http://google.com')

rajatomar788 commented 4 years ago

The url value in the pywebcopy.config.setup_config is used for setup purposes. The config handlers loads robots.txt from this url and also it prepares the base path based roughly on the url where all the files will end up.

But in the second case the url is used to load page itself from the server.

So bottom line is they both are required. You should just store the url in a variable and just pass it twice.

marshonhuckleberry commented 4 years ago

from what i can understand from requests documentation and google is that there are two ways of using cookies:

  1. manually set cookies and send them (a function called post or something)
  2. automaticaly set cookies like in a browser (a function called session() or something) however requests documentation lacks usefull examples wich shows that two ways of using cookies, most important, also testing cookies not easy, maybe test on facebook login, anyway trying to understand those things seems like a big big waste of time with no results
marshonhuckleberry commented 4 years ago

and i forgot to say is not just complicated to set cookies in requests but is 1000 complicated to set cookies in pywebcopy

rajatomar788 commented 4 years ago

Cookies are automatically stores in the pywebcopy.SESSION attribute which is essentially a requests.Session() object. If you can do it on requests then you can do it on pywebcopy. There is absolutely no difference. Btw requests is a very well known library in Python . You can find hundreds of ways to authenticate through requests on stackoverflow.

On Fri, Jan 31, 2020, 5:39 PM marshonhuckleberry notifications@github.com wrote:

and i forgot to say is not just complicated to set cookies in requests but is 1000 complicated to set cookies in pywebcopy

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rajatomar788/pywebcopy/issues/30?email_source=notifications&email_token=AIGSNTS72EZ3MLZR32S5VLTRAQIFZA5CNFSM4KLGCIE2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKOOSMI#issuecomment-580708657, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIGSNTWA6ZEU7L2ECQTQKNLRAQIFZANCNFSM4KLGCIEQ .

rajatomar788 commented 4 years ago

If you still have issue then use a third party library MechanicalSoup for browser navigation and form filling in python then use that library session to set it as pywebcopy.SESSION. It will be smooth. I am now closing this issue if you have any further problems then you can always reopen this issue.