pettazz / pygooglevoice

Fixed version of pygooglevoice cloned from googlecode project that hasn't been updated since 2009.
BSD 3-Clause "New" or "Revised" License
182 stars 117 forks source link

2019 loginerror again #61

Open ag415 opened 5 years ago

ag415 commented 5 years ago

It looks like the proverbial loginerror has returned sometime recently. I noticed it about a week ago. I assumed i just needed to do the captcha unlock steps like I did last time but that doesn't seem to be working anymore. I'm guessing google must have changed their login page again. anyway, just letting you know that the login function is broken again. I know it isnt your fault, but hoping it can be fixed soon. Id fix it myself if I understood more about how pygooglevoice works, but the code is really difficult to follow.

ag415 commented 5 years ago

is this project still being maintained? please let me know, or if there is another version of this project that is more up to date that I should be using instead...

gkuenning commented 5 years ago

As far as I know, people are still working on the project. It's hard to respond quickly when Google decides to change things, since doing so requires reverse engineering. And there have been times in the past when Google broke the interface and then fixed it a few days later.

FWIW, I've also seen problems. According to my logs, my last successful access was June 5 at 00:37 UTC; my next attempt on June 5 at 04:55 UTC failed. I'm getting the following error:

Voice login failure due to URLError, retrying: HTTP Error 400: Bad Request

I haven't yet dug into the cause of this error.

is this project still being maintained? please let me know, or if there is another version of this project that is more up to date that I should be using instead...

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/pettazz/pygooglevoice/issues/61#issuecomment-499672083

-- Geoff Kuenning geoff@cs.hmc.edu http://www.cs.hmc.edu/~geoff/

XML: verbose obfuscation in the service of simple key-value pairs.

ag415 commented 5 years ago

could you perhaps explain a little about how the login function works so I can take a look? I'm not sure how to get the actual response that it's getting from google when attempting the login or how it's actually processing the login. if you can point me to which parts of the code are doing this, perhaps I can write some wrapper code to do the login and then pass the necessary object(s) back to pygooglevoice so that it has a working session. I had to do something kind of similar for the broken pagination of SMS messages. Basically I just have my own code that grabs the Requests session from pygooglevoice and then uses it to fetch the rest of the SMS pages. I could probably do a similar hack to get the login working correctly. Please let me know if theres a way I can do this.

gkuenning commented 5 years ago

I don't think it can be done with a wrapper.

Login is handled by do_login in voice.py, which delegates the actual HTTP accesses to __do_page in the same file. The basic login process is this:

  1. Fetch the front page of Google Voice using __do_page. This step is succeeding.

  2. Extract certain information from the Web page, notably the "galx" field, which contains some kind of code to be used in the next login step.

  3. Pass the login name, password, and magic code to Google in a URL (I think it's actually an HTTP GET request), again using __do_page. This is the step that is failing with BAD REQUEST.

The problem, which has been solved before, is to figure out what's wrong with the request being passed in step 3, and then change the code to fix it. The best way to approach that task is to use a browser to reproduce steps 1 and 3, capturing what the browser sends to Google. Then you have to figure out what parts are necessary and what is just fluff. Once you know what's required, it's usually easy to fix the code to match that requirement.

Of course, if Google would just give us an API to voice none of this silliness would be necessary...

could you perhaps explain a little about how the login function works so I can take a look? I'm not sure how to get the actual response that it's getting from google when attempting the login or how it's actually processing the login. if you can point me to which parts of the code are doing this, perhaps I can write some wrapper code to do the login and then pass the necessary object(s) back to pygooglevoice so that it has a working session. I had to do something kind of similar for the broken pagination of SMS messages. Basically I just have my own code that grabs the Requests session from pygooglevoice and then uses it to fetch the rest of the SMS pages. I could probably do a similar hack to get the login working correctly. Please let me know if theres a way I can do this.

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/pettazz/pygooglevoice/issues/61#issuecomment-499694793

-- Geoff Kuenning geoff@cs.hmc.edu http://www.cs.hmc.edu/~geoff/

The DMCA criminalizes curiosity. It would put Susie in jail for taking her stereo apart to see how it works.

ag415 commented 5 years ago

Thanks for the explanation. Agree 100% about google's api. Google has really turned into a trash company in recent years. I used to have so much respect for them and loved their service, but in recent years they've left a real nasty taste in my mouth. They've pretty much evolved (devolved?) into the very thing they were against in the beginning. They went from a simple and clean, lightweight site to a bloated horror show of 3rd party javascript libarries, unreadable minified garbage, bloatware, every action generating a thousand outgoing network requests, and just downright garbage. Just this massive bandwidth and cpu/memory consuming behemoth. I can't even use the Google voice web app on my mobile phone anymore because of how crappy and bloated and awful it is. I'm actually using pygooglevoice to make my own web app for it that actually doesn't take 5 hours to load on my flip phone (really miss grandcentral!) lol Anyway

Thanks for all you guys' hard work on this project. We'd be pretty screwed trying to automate anything with google voice without you. I'll take a look at the login and see if maybe I can fix it. Guess we'll see how finds a fix first!

ag415 commented 5 years ago

hey gkuenning, so I've been doing some poking around this login function to figure out how it works.

It seems that google's login system has radically changed from what it was before. It looks like the login function of pygooglevoice (I could be wrong about this since the code is REALLY difficult to follow with all the back and forth function calls and weird naming conventions for everything) was using some older login method where it was able to pass both the username and password together in a single request to Google.

However, Google have updated their login form where you have to first send the username, then the password in a separate request. It also doesn't look like the login form makes any use of that 'gxf' field or whatever at all .

When I was reproducing it in my browser, the login process looked totally different from what Pygooglevoice does. It looks like it makes a call to some kind of endpoint named "empty" which (as its name suggests) doesn't return any response when you call it.

I dont know how the site figures out whether you're entered the correct password or not, the response from that "empty" endpoint is always... well, empty, whether you give it a correct password or not. I did notice that the HTTP response code of this endpoint was 204 when I logged in successfully whereas when I gave it the wrong password there was no response code at all. Maybe that's how it's checking it.

IT's really hard to understand what's happening during this login process in your web browser as there's like 20 outbound network requests being made each time and all kinds of data is being passed around.

Anyway, it looks like pygooglevoice was passing the the following parameters to Google in the login request: {'Email': email, 'Passwd': passwd, 'gxf': gxf}

This looks to be incorrect now. When I reproduce the login process in my browser, what I see instead is the email being passed first in its entirety in a field called "identifier" and the password passed in a field called "password". Then on the subsequent request only the username of the gmail account (e.g. yourusername for yourusername@gmail.com) is passed in the "identifer" field. There's also two other paramters but their values are always blank. I don't have it in front of me but they both started with a C

I have no idea how to fix this at this point. The code in pygooglevoice is too difficult for me to follow. I was able to find out what URLs its using and what data its passing by inserting some print statements into the login function, but the rest of the code flow just confuses me and gives me a headache.

It seems like a radical re-write of the entire login process has to be done. I would suggest also cleaning up the code so its easier to follow and you dont have to trace through like 20 function calls to figure out what is happening. I'd do it myself, but again I can't follow this code at all and it hurts to even try...

ag415 commented 5 years ago

I did notice that the regex in the code is still getting a value for "gxf" but i dont think that value is useful anymore. When i reproduced the login process in my browser I did not once see "gxf" being passed anywhere.

gkuenning commented 5 years ago

I've noticed the same complex login process. However, there used to be a less complex process, which is what pygooglevoice used. I suspect that what has happened is that either Google disabled the less complex process, or they've made a minor change to it. I'm hoping the latter.

I believe that a lot of the browser-based process is unnecessary; as you mentioned, Google has introduced a ton of fluff. But I think there must still be a back door of some sort because the Voice application on Android still works. Unfortunately it's not possible to sniff that conversation, even directly on the phone, because it's encrypted.

I'm also clueless about the way the Python code works. Perhaps somebody else on the list can offer thoughts?

I did notice that the regex in the code is still getting a value for "gxf" but i dont think that value is useful anymore. When i reproduced the login process in my browser I did not once see "gxf" being passed anywhere.

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/pettazz/pygooglevoice/issues/61#issuecomment-501955451

-- Geoff Kuenning geoff@cs.hmc.edu http://www.cs.hmc.edu/~geoff/

Orchestra retrospectively extremely satisfied with symphony [No. 1] as result of barrel of free beer. -- Gustav Mahler, post-premiere letter to Arnold Berliner

ag415 commented 5 years ago

Maybe its just time to re-write this entire module. It was written a long time ago by a couple of guys who apparently work at Google, they don't maintain it anymore and nobody knows what even happened to them. Their code is old, and its really confusing. They do a lot of weird stuff and the logic is nearly impossible to follow. It would be great to see this module rewritten from the ground up in a sane and clean way that doesn't involve hundreds of back and forth function calls, esoteric function and variable names, etc. Honestly, its starting to sound like rewriting this module from scratch might actually turn out to be easier than constantly hacking it to keep up with Google's awful changes.

ag415 commented 5 years ago

Might also suggest using Selenium and a headless browser like PhantomJS to do the login if it requires a lot of javascript and other crap google defiled it with in recent years. It's going to consume more resources and I'm not sure if it'll work in a CLI-only environment (maybe graphics are required even for headless browsers, but im not sure). Compared to using stuff like URLLib2 or Requests in conjunction with BeautifulSoup, something like Selenium will give you a full javascript engine and behave just like a regular browser would, so you dont have to write extra code to scan through javascript code and figure out what kind of data needs to get passed where to make the login work. It provides with a DOM-style interface to everything, and you can just instruct it to do the login process the same way a human being would in their browser. This will be easier than trying to hand-code the process using URLLib2/Requests and BeautifulSoup or xml.minidom.

ag415 commented 5 years ago

The above only applies if Javascript is necessary though. If we can manage without javascript then it'll be more efficient to stick with URLLib2/Requests and BeautifulSoup/xml.minidom

ag415 commented 5 years ago

You can also export the session data from a selenium webdriver session to something like requests, so you dont have to keep using selenium for everything. You can use it just to handle the login process and run whatever javascript garbage Google needs us to run before allowing us to proceed with the login, then pass the session data off to Requests/URLLib2 and take it from there.

gkuenning commented 5 years ago

For me, using an external tool like selenium would be a major drawback since I use pygooglevoice on my phone in a completely automated fashion.

It's also worth noting that there must be an API, since the Voice app for Android has no difficulty getting at Google Voice's features. I haven't tried sniffing the Voice app's conversation yet, partly because I assume that it's encrypted so it wouldn't reveal much beyond the IP address and protocol. I imagine that somebody who's good with Android debugging might be able to intercept the messages before they went to the network stack, but that sounds pretty hard.

The above only applies if Javascript is necessary though. If we can manage without javascript then it'll be more efficient to stick with URLLib2/Requests and BeautifulSoup/xml.minidom

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/pettazz/pygooglevoice/issues/61#issuecomment-502303402

-- Geoff Kuenning geoff@cs.hmc.edu http://www.cs.hmc.edu/~geoff/

If we knew what we were doing, it wouldn't be research, would it? -- Albert Einstein

jhgorse commented 5 years ago

Background and overview on WebAuthn: https://developers.google.com/web/updates/2018/05/webauthn

https://webauthn.io/ => python module https://github.com/duo-labs/py_webauthn

TL;DR: password authentication is bad; use pubkey crypto instead.

jhgorse commented 5 years ago

Some reference material on server-side one-time authentication code: https://developers.google.com/identity/sign-in/web/server-side-flow

gkuenning commented 5 years ago

Thanks for that link.

I'm a little confused, though, by the characterization of the authentication code as a one-time item. How one-time is it?

It might help to explain my use case, which once upon a time was done with a handy Tasker plugin that is no longer maintained, and is now done with pygooglevoice and a script I wrote. When I leave home, Tasker notices that fact and disables ringing on my home phone. That's critical to me because I don't want my calls to be sent to my family when I'm not there. Similarly, when I arrive at the office the script runs again and enables my office phone, so that I can use that line when it's more convenient.

The important thing is that this all happens automatically, without an interaction from me. I'm fine with getting that one-time token once, but if I have to do it multiple times then it's really not usable. And having watched the very bad video at that link, it looks like the tokens only last for an hour. Asking my phone to refresh every hour is clumsy and failure-prone.

Clearly, none of this is actually necessary since the Voice application on my phone works fine without asking me to repeatedly sign in, approve sign-in, or even be online.

Some reference material on server-side one-time authentication code: https://developers.google.com/identity/sign-in/web/server-side-flow

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/pettazz/pygooglevoice/issues/61#issuecomment-502844882

-- Geoff Kuenning geoff@cs.hmc.edu http://www.cs.hmc.edu/~geoff/

It's is not, it isn't ain't, and it's it's, not its, if you mean it is. If you don't, it's its. Then too, it's hers. It isn't her's. It isn't our's either. It's ours, and likewise yours and theirs. -- Oxford University Press, Edpress News

jhgorse commented 5 years ago

It's how long it lasts before going stale. Then you generate a new one, automatically if you have the private key parts.

That's a cool use case. =)

On Jun 17, 2019, at 18:45, gkuenning notifications@github.com wrote:

Thanks for that link.

I'm a little confused, though, by the characterization of the authentication code as a one-time item. How one-time is it?

It might help to explain my use case, which once upon a time was done with a handy Tasker plugin that is no longer maintained, and is now done with pygooglevoice and a script I wrote. When I leave home, Tasker notices that fact and disables ringing on my home phone. That's critical to me because I don't want my calls to be sent to my family when I'm not there. Similarly, when I arrive at the office the script runs again and enables my office phone, so that I can use that line when it's more convenient.

The important thing is that this all happens automatically, without an interaction from me. I'm fine with getting that one-time token once, but if I have to do it multiple times then it's really not usable. And having watched the very bad video at that link, it looks like the tokens only last for an hour. Asking my phone to refresh every hour is clumsy and failure-prone.

Clearly, none of this is actually necessary since the Voice application on my phone works fine without asking me to repeatedly sign in, approve sign-in, or even be online.

Some reference material on server-side one-time authentication code: https://developers.google.com/identity/sign-in/web/server-side-flow

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/pettazz/pygooglevoice/issues/61#issuecomment-502844882

-- Geoff Kuenning geoff@cs.hmc.edu http://www.cs.hmc.edu/~geoff/

It's is not, it isn't ain't, and it's it's, not its, if you mean it is. If you don't, it's its. Then too, it's hers. It isn't her's. It isn't our's either. It's ours, and likewise yours and theirs. -- Oxford University Press, Edpress News — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

ag415 commented 5 years ago

@gkuenning they definitely have some kind of API, although I don't think they actually intend for it to be accessible to us. There was a discussion about this on the VOIP forum on DSLReports a few years ago: https://www.dslreports.com/forum/r32046123-pygooglevoice-oauth2

It might be possible to reverse engineer it and re-write the pygooglevoice module to use it.

I've done some poking around with the modern Google Voice web app and I even got some code that will work if you can provide it with a valid Authorization header. Here's an example:


import requests
import time
import pprint
import re
import json

AUTHORIZATION = 'Bearer ya29.SOME CRAP' # this is your authorization token. I don't know how to obtain this automatically, but I do know that this code worked when i copied and pasted my auth token from my browser here.
BATCH = 'xxx' # this is a numeric identifier for the batch of messages or whatever you're requesting. when you fetch sms messages in the gv web app, pay attention to network console and you'll see what I mean here
MESSAGE_ID = '+xxx' # this is a message id. see network console when using google voice for details 
TIMESTAMP = str(int(time.time())) + '000' # set this to current time. UNIX Epoch + 000
TEL = '+15558765309' # set this to the phone number of the other party in the conversation

ORIGIN = 'https://content.googleapis.com'
ACCEPT_ENCODING = 'gzip, deflate, br'
X_ORIGIN = 'https://voice.google.com'
ACCEPT_LANGUAGE = 'en-US,en;q=0.9'

X_REQUESTED_WITH = 'XMLHttpRequest'
X_CLIENTDETAILS = 'appVersion=5.0%20(Windows%20NT%2010.0%3B%20Win64%3B%20x64)%20AppleWebKit%2F537.36%20(KHTML%2C%20like%20Gecko)%20Chrome%2F73.0.3683.103%20Safari%2F537.36&platform=Win32&userAgent=Mozilla%2F5.0%20(Windows%20NT%2010.0%3B%20Win64%3B%20x64)%20AppleWebKit%2F537.36%20(KHTML%2C%20like%20Gecko)%20Chrome%2F73.0.3683.103%20Safari%2F537.36'
X_GOOG_ENCODE_RESPONSE_IF_EXECUTABLE = 'base64'
USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36'
CONTENT_TYPE = 'multipart/mixed; boundary=batch' + BATCH
ACCEPT = '*/*'
REFERER = 'https://content.googleapis.com/static/proxy.html?usegapi=1&jsh=m%3B%2F_%2Fscs%2Fabc-static%2F_%2Fjs%2Fk%3Dgapi.gapi.en.NqPOw1G8B3I.O%2Frt%3Dj%2Fd%3D1%2Frs%3DAHpOoo9Un2bnlKyVHr37bEqQCNKVG9ZmzA%2Fm%3D__features__'
AUTHORITY = 'content.googleapis.com'
X_JAVASCRIPT_USER_AGENT = 'google-api-javascript-client/1.1.0'
X_REFERER = 'https://voice.google.com'

RECV_CODE = 10
SENT_CODE = 11

headers = {
    'origin': ORIGIN,
    'accept-encoding': ACCEPT_ENCODING,
    'x-origin': X_ORIGIN,
    'accept-language': ACCEPT_LANGUAGE,
    'authorization': AUTHORIZATION,
    'x-requested-with': X_REQUESTED_WITH,
    'x-clientdetails': X_CLIENTDETAILS,
    'x-goog-encode-response-if-executable': X_GOOG_ENCODE_RESPONSE_IF_EXECUTABLE,
    'user-agent': USER_AGENT,
    'content-type': CONTENT_TYPE,
    'accept': ACCEPT,
    'referer': REFERER,
    'authority': AUTHORITY,
    'x-javascript-user-agent': X_JAVASCRIPT_USER_AGENT,
    'x-referer': X_REFERER,
}

data = """--batch""" + BATCH + """
Content-Type: application/http
Content-Transfer-Encoding: binary
Content-ID: <batch""" + BATCH + MESSAGE_ID + """@googleapis.com>

POST /voice/v1/voiceclient/api2thread/get?alt=protojson
Content-Type: application/json+protobuf
X-JavaScript-User-Agent: """ + X_JAVASCRIPT_USER_AGENT + """
X-Requested-With: + """ + X_REQUESTED_WITH + """
X-Goog-Encode-Response-If-Executable: """ + X_GOOG_ENCODE_RESPONSE_IF_EXECUTABLE + """
Authorization: """ + AUTHORIZATION + """
X-ClientDetails: """ + X_CLIENTDETAILS + """

["t.""" + TEL + """\",100,\"""" + TIMESTAMP + """\",[null,true,true]]
--batch""" + BATCH + """--"""

class Message:
    def __init__(self, data):
        self.id = data[0]
        self.timestamp = data[1]
        self.party1 = data[2]
        self.party2 = data[15]
        self.code = data[4]
        if self.code == RECV_CODE:
            self.sender = self.party2
            self.recipient = self.party1
        elif self.code == SENT_CODE:
            self.sender = self.party1
            self.recipient = self.party2
        self.text = data[9]

response = requests.post('https://content.googleapis.com/batch', headers=headers, data=data)
data = re.split('\\bcontent-length.*\\b', response.content, 1, re.IGNORECASE)[-1]
data = data[:data.rindex('--batch')]
json_data = json.loads(data)
pp = pprint.PrettyPrinter(indent=2)

batch_data = json_data[0]
message_data = batch_data[2]
oldest_timestamp = batch_data[3]

messages = []
for data in message_data:
    message = Message(data)
    messages.append(message)

for msg in messages:
    print('[' + msg.id + '] (' + str(msg.timestamp).encode('utf8') + ') ' + msg.sender + '->' + msg.recipient + ': ' + msg.text)`

Hope this helps

edit: sorry for the broken code block formatting. I guess the "code" button on this wysiwyg editor in github doesn't work for python code or something

edit #2 fixed (kind of) broken code formatting per @jhgorse's suggestion. This text is included in the code block for some reason, but I don't know why
jhgorse commented 5 years ago

edit googlevoice/settings.py set DEBUG to True

...
DEBUG:urllib3.connectionpool:https://accounts.google.com:443 "POST /signin/challenge/sl/password?service=grandcentral&continue=https://www.google.com/voice/redirection/voice&followup=https://www.google.com/voice&ltmpl=open HTTP/1.1" 400 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): www.google.com:443
/tmp/pygooglevoice/venv/local/lib/python2.7/site-packages/urllib3-1.25.3-py2.7.egg/urllib3/util/ssl_.py:149: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. You can upgrade to a newer version of Python to solve this. For more information, see https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
  InsecurePlatformWarning
DEBUG:urllib3.connectionpool:https://www.google.com:443 "GET /voice/b/0/ HTTP/1.1" 302 None
DEBUG:urllib3.connectionpool:https://accounts.google.com:443 "GET /ServiceLogin?service=grandcentral&passive=1209600&continue=https://www.google.com/voice/b/0/&followup=https://www.google.com/voice/b/0/&ltmpl=open HTTP/1.1" 200 None
Traceback (most recent call last):
  File "examples/folders.py", line 16, in <module>
    __name__ == '__main__' and run()
  File "examples/folders.py", line 8, in run
    voice.login()
  File "build/bdist.linux-x86_64/egg/googlevoice/voice.py", line 101, in login
googlevoice.util.LoginError

https://github.com/pettazz/pygooglevoice/blob/master/googlevoice/voice.py#L60

    def login(self, email=None, passwd=None, smsKey=None):
        """
        Login to the service using your Google Voice account
        Credentials will be propmpted for if not given as args or in the
        ``~/.gvoice`` config file
        """
        if hasattr(self, '_special') and getattr(self, '_special'):
            return self

        email = email or config.email or input('Email address: ')
        passwd = passwd or config.password or getpass.getpass()

        content = self.__do_page('login').text
        # holy hackjob
        gxf = re.search(
            r"type=\"hidden\"\s+name=\"gxf\"\s+value=\"(.+)\"",
            content).group(1)
        result = self.__do_page(
            'login_post',
            {'Email': email, 'Passwd': passwd, 'gxf': gxf})

        if result.url.startswith(getattr(settings, "SMSAUTH")):
            content = self.__smsAuth(smsKey)

            try:
                smsToken = re.search(
                    r"name=\"smsToken\"\s+value=\"([^\"]+)\"",
                    content).group(1)
                content = self.__do_page(
                    'login',
                    {'smsToken': smsToken, 'service': "grandcentral"})
            except AttributeError:
                raise util.LoginError

            del smsKey, smsToken, gxf

        del email, passwd

        try:
            assert self.special
        except (AssertionError, AttributeError):
            raise util.LoginError

        return self

Line 72 is the beginning of the issue.

The spec is

inputs:

output:

Google Voice Web Login Sequence

1) https://accounts.google.com/signin/v2/identifier?service=grandcentral&passive=1209600&flowName=GlifWebSignIn&flowEntry=ServiceLogin <input type="email" ... POST with this set to email. 2) https://accounts.google.com/signin/v2/sl/pwd?service=grandcentral&passive=1209600&flowName=GlifWebSignIn&flowEntry=ServiceLogin&cid=1&navigationDirection=forward

<form action="/signin/v2/challenge/password/empty" method="post" novalidate="" jsaction="submit:JM9m2e;" _lpchecked="1"><span jsslot=""><section class="TgkVnd" jscontroller="KBlqf" jsshadow=""><header class="juTfp" jsname="tJHJj" aria-hidden="true"></header><div class="dMArKd bxPAYd k6Zj8d" jsname="MZArnb"><span jsslot=""><input type="email" name="identifier" class="VwCw" tabindex="-1" aria-hidden="true" spellcheck="false" value="xxxxxx@gmail.com" jsname="KKx9x" autocomplete="off" id="hiddenEmail" 
...
 aria-atomic="true" aria-live="assertive"></div></div></div></div><input jsname="SBlSod" type="hidden" name="ct" id="ct"></div></span></div></section></span></form>

3) recovery options... optionally handle this https://myaccount.google.com/signinoptions/recovery-options-collection?utm_source=Web&utm_medium=Web&utm_campaign=interstitial&oev=lytf%3D7%26wvtx%3D2%26trs%3Dli%26stel%3D1&hl=en&service=grandcentral&continue=https://accounts.google.com/ServiceLogin?continue%3Dhttps%253A%252F%252Faccounts.google.com%252FManageAccount%26service%3Dgrandcentral%26hl%3Den%26authuser%3D0%26passive%3Dtrue%26sarp%3D1%26aodrpl%3D1%26checkedDomains%3Dyoutube%26checkConnection%3Dyoutube%253A336%253A1%26pstMsg%3D1&rapt= ... special... &pli=1

<span class="RveJvd snByac">Done</span>

Click DONE

We should have the special now. Not sure how this is intended to work:

        if getattr(self, '_special', None):
            return self._special
        pattern = re.compile(r"('_rnr_se':) '(.+)'")
        resp = self.session.get(settings.INBOX).text
        try:
            sp = pattern.search(resp).group(2)

It would be nice if requests module could be developed alongside an actual web browser or in a pseudo web browser mode. I am unfamiliar with the library. Though perhaps there is something already out there in python which will do this login session?

Cheers, Joe

jhgorse commented 5 years ago

edit: sorry for the broken code block formatting. I guess the "code" button on this wysiwyg editor in github doesn't work for python code or something

@ag415 try ```python as the opening line for blocks of python code.

Also, see the spec I posted for what we need to do to get this working again. Consider testing this module: https://github.com/duo-labs/py_webauthn

EDIT: The inputs to webauthn may be wrapped outputs of the javascript and server-side queries for the pubkey of the account. Doing this part on our own seems daunting unless there is a standard mechanism for doing this in Google-land. I suspect authentication is well supported for other products. Which should we look at for inspiration?

jhgorse commented 5 years ago

For example: https://developers.google.com/drive/api/v3/quickstart/python pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib

Hey, would you look at that... Doesn't this look familiar?

{
  "installed": {
    "client_id": "123456789012-r8bjvdm7850l4rqeejqmnv2mi860efq6.apps.googleusercontent.com",
    "project_id": "quickstart-1560826333251",
    "auth_uri": "https://accounts.google.com/o/oauth2/auth",
    "token_uri": "https://oauth2.googleapis.com/token",
    "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
    "client_secret": "...xxx you are blinded by the light...",
    "redirect_uris": [ "urn:ietf:wg:oauth:2.0:oob", "http://localhost" ]
  }
}

Which leads me to google-auth-library-python project. Not linking to it for now.

jhgorse commented 5 years ago

Step 1) OAuth2 with Google to get the credential. https://requests-oauthlib.readthedocs.io/en/latest/examples/google.html Step 2) Obtain user credentials using the access token and helper functions. https://google-auth.readthedocs.io/en/latest/user-guide.html#user-credentials

ag415 commented 5 years ago

What are you guys' thoughts on using the modern, undocumented Google Voice API as covered briefly in my previous comment here? Unlike the method that pygooglevoice uses (scraping HTML data out of an XML document), this gives us a json object that doesn't require nearly as much parsing work.

The flipside is that a lot of it is confusing (e.g. returning us a list of nameless values that we have to figure out for ourselves instead of something more sane like a dictionary - although these values do seem to at least remain in the same indices) and undocumented since Google never seemed to intend for us to use their API.

There's also of course no guarantee that they won't change it and break everything again. I do think it's worth giving some thought. Maybe even have a way to configure this module to use the oldschool method or use their new API? Like maybe as a fallback, in case the old school method fails or vice versa?

My only challenge with this is that I don't know of a way to automatically obtain the authorization token. I've tested this code by copying and pasting my auth token from my browser after logging into Google Voice manually. If we could figure out a way to obtain that header value automatically I'm fairly confident that we could make the official internal Google Voice API work for pygooglevoice

ag415 commented 5 years ago

Also if we decide to implement the login process as per the spec that @jhgorse suggested, I would think that Selenium + PhantomJS would be the perfect tool for this job. We could tell it to fill in the username and password fields and click whatever buttons are displayed on the screen just like a real user would. Once it obtains a valid login session, you can then export that session data to Requests or URLLib2 directly, completely dereference Selenium/PhantomJS to save memory and carry on with everything else using purely Requests/URLLib2. You'll no longer need any of these crazy hackjobs to find hidden values on the page using regular expressions. This will also be a little more resilient to minor changes by google, as long as we can tell Selenium how to find the elements the user is supposed to interact with to login. If we can combine that with the successful obtainment of an authorization header, we can plug that into my code and actually use google's real API instead of continuing this ugly business of scraping HTML data out of XML documents. What do you guys think?

jhgorse commented 5 years ago

@ag415 I don't have a strong opinion either way. If you can show a working example for each of the functions I am sure we can port it. This module's interface is clean and would be easy to fill in if we had examples of the current api.

In either case, I would like a test suite which verifies each function (login, call, sms, folders, and so on) and report failures in a way that makes it obvious what broke and where to learn more.

In terms of selenium + phantomjs, you are proposing using it to do the two-page or two-factor webauthn. This should work. Can you write up a script using it to get the token? A gist python script ought to suffice.

If it is easier than figuring out OAuth2 and the other official login or service-based auth api's, then so be it. It could also serve as a backup method for authentication if the other methods get figured out and implemented.

Cheers, Joe

jhgorse commented 5 years ago

@ag415 I am having trouble obtaining the AUTHORIZATION = 'Bearer ya29.SOME CRAP' token from your example. How did you get it? I read the dslreports (https://www.dslreports.com/forum/r32046123-pygooglevoice-oauth2~start=30) and did not see anything obvious for where to scrape it from the web browser.

EDIT: Note that the dslreports thread is outdated.

ag415 commented 5 years ago

@ag415 I am having trouble obtaining the AUTHORIZATION = 'Bearer ya29.SOME CRAP' token from your example. How did you get it? I read the dslreports (https://www.dslreports.com/forum/r32046123-pygooglevoice-oauth2~start=30) and did not see anything obvious for where to scrape it from the web browser.

EDIT: Note that the dslreports thread is outdated.

After logging into your Google account, go to https://voice.google.com/u/1/messages to view the Google Voice messages page. Open the network console in your web browser, and then click on any of the SMS conversations in the list. You should see a request for a long URL at the domain content-people-pa.googleapis.com. View the request headers, you should see the authorization token there.

Unfortunately I don't know how to obtain this authorization token automatically. It appears that completing Google's unnecessarily complex login process is required for this. There may be further steps required after logging in (e.g. requesting the Google Voice page and scraping the token from some javascript on it or something).

I do think that using a headless browser with Selenium could allow us to obtain this since the browser would be doing the login process with a working javascript engine that'd run whatever crap Google needs us to run in order to get the token, saving us from the headache of having to deduce the token by writing code that digs through the javascript/searches for regex patterns etc

Also note it has admittedly been a few months since I've tested that code. I don't know for sure if it still works or not. Try it with a valid authorization token, message id, batch number and contact phone number and let me know

Hope this helps.

jhgorse commented 5 years ago

@ag415 I can see it now. I've played around with requests and bs4 and urllib3 a bit and they do not look promising at the moment. My request-foo could use some practice, admittedly. I am sure it is possible to reverse engineer the GETs and POSTs to make this work.

I have two questions: 1) Can we use this Bearer authorization to authenticate the pygooglevoice module directly? 2) Can we use this Bearer authorization to get refresh token which can then get one time authorization codes which can then be used in the pygooglevoice module? refresh tokens: https://martinfowler.com/articles/command-line-google.html Unfortunately, Martin Fowler requires the use of the dev console to create the API key which we cannot do for Google Voice, as there is no public API mechanism for provisioning.

jhgorse commented 5 years ago

I am wondering if the authentication scope is the same as google plus. I see it when I search for scopes.

yummypurplestuf commented 5 years ago

Is there a reason we're not authenticating like gspread does?

gspread documentation

jhgorse commented 5 years ago

@yummypurplestuf Login to Google Developers Console and see. Per gspread documentation: https://gspread.readthedocs.io/en/latest/oauth2.html#using-signed-credentials

Try it.

tsdg112 commented 5 years ago

I have uploaded a Selenium with Python replacement for gvoice at: https://github.com/tsdg112/selgooglevoice .

It runs much slower than the old gvoice, and I hope that people make progress getting gvoice to work without selenium.

jhgorse commented 5 years ago

@tsdg112 very good!

Now if the credentials can be extracted from the selenium authentication step, then @ag415 's proposal using the new api could work. Also, we could plug it into the existing api to see if it works as well.

We would plug it into the "special" variable to test: https://github.com/pettazz/pygooglevoice/blob/master/googlevoice/voice.py#L57

@all do we know what used to go into this variable in order for it to work? Which auth token?

jaraco commented 1 year ago

FYI, this project is abandoned and superseded by googlevoice. The login issue is still present, but contributions are welcome at jaraco/googlevoice.