voussoir / timesearch

The subreddit archiver
BSD 3-Clause "New" or "Revised" License
171 stars 7 forks source link

Proxy #1

Closed Brotakuu closed 6 years ago

Brotakuu commented 6 years ago

Can http/https proxies be enabled (like on praw)?

voussoir commented 6 years ago

Timesearch doesn't make any HTTP requests on its own, everything is done through PRAW. And then, TS does not instantiate its own Praw, it imports your bot file [0] and calls either bot.anonymous() [1] or bot.login() depending on what it needs.

So if you write a bot file that uses proxies in the PRAW instance you should be good to go. I have never tried them myself.

However you have just made me find a bug. Back in the PRAW3 days we were able to take a logged out session and log it in. But in PRAW4 all of the credentials are passed into the constructor so we have to create new separate objects when we log in. As such I think many of the login lines [2] are probably broken. I can push a fix for those that overwrites common.r with the new instance.

Edit: https://github.com/voussoir/timesearch/commit/5d1c2f885d231ae9a24b40048dcbee0adcc2be5a

[0]: https://github.com/voussoir/reddit/blob/760a57322f96207bcbd40f5bf01d0916d89bf0b7/bot4.py

[1]: https://github.com/voussoir/timesearch/blob/0ec6002cba484ccc3b41e6e047686aae00249c87/timesearch/common.py#L27

[2]: https://github.com/voussoir/timesearch/blob/0ec6002cba484ccc3b41e6e047686aae00249c87/timesearch/timesearch.py#L28

Brotakuu commented 6 years ago

Thanks for the explanation. So I can set the HTTPS_PROXY env in my bot.py (using the bot.py example in the README) file and it should pass to the PRAW instance?

HTTPS_PROXY=https://localhost:3128
voussoir commented 6 years ago

Hmm, I was expecting proxies to be an argument into the praw.Reddit constructor. I guess they are just environment variables and Requests will detect them automatically. I didn't know that.

http://praw.readthedocs.io/en/latest/getting_started/configuration.html?highlight=proxy#using-an-http-or-https-proxy-with-praw

In that case, you might be able to set this variable during bot.py

...
import os
os.environ['HTTPS_PROXY'] = '...'

import praw

USERAGENT = '...'
...

But I don't know when Requests gathers the proxy from the env variables. By the time bot.py gets run, it may be too late. I would suggest setting this environment variable from the commandline or your OS's environment variable editor before running the script.

Brotakuu commented 6 years ago

That makes sense, thank you for the quick reply and for making timesearch.

A bit of a tangent, but with the removal of timestamp search (starting today to the end of march), are there any possible workarounds to keeping timesearch working?

Also, I assume commentaugment and livestream should continue to work?

voussoir commented 6 years ago

Just to clarify, if you create a variable in bot.py called HTTPS_PROXY it will not do anything. It needs to be an environment variable, which is something that exists on your operating system. It's not a Python variable. This means you need to use os.environ to set it, or do it on the command line before starting Python.

In Windows we use set HTTPS_PROXY=... to make temporary env vars, and setx HTTPS_PROXY "..." to make permanent ones.

On Linux I think they just use HTTPS_PROXY=... for temporary and export HTTPS_PROXY "..." for permanent. That's why the PRAW docs show HTTPS_PROXY=https://localhost:3128 ./prawbot.py.

voussoir commented 6 years ago

No problem, thanks for bringing up this issue.

With the death of reddit's timestamp query, I cannot think of any pure-PRAW way to continue doing timesearch. /u/Stuck_in_the_Matrix runs a website called PushShift.io which has an API that could probably replace this feature. This may be something I can do but to be honest it's not going to be immediately. Maybe I can talk to him about it.

Yeah, all of the other tools should continue to work fine.