Support longer passwords in Windows

jaraco commented 2 years ago

@jaraco I know this issue is closed, but the Windows limit is getting in our way, too, and I have a possible idea for how to fix it.

Here's the version of Windows I'm on

Edition Windows 10 Pro
Version 21H1
Installed on    ‎5/‎1/‎2021
OS build    19043.1320

The limit documented by @cvubrugier is the same limit I am experiencing: 1280 max for the password.

Yet some packages use keyring to store tokens (JWT) in keyring, and these tokens can easily exceed 1280 bytes.

So my idea is to edit https://github.com/jaraco/keyring/blob/main/keyring/backends/Windows.py, in particular _set_password and _get_password, notice the "too big" password, and split it up into as many pieces as necessary, giving each one a unique (and numbered) TargetName.

Since _set_password and _get_password are the callers of win32.CredWrite and win32.CredRead, I can transparently split the password up in _set_password and then transparently re-join it up in _get_password and no other code will be the wiser for it.

I've prototyped this and it works.

Is this a PR you would potentially welcome (or, minimally, indulge?)

Originally posted by @jrobbins-LiveData in https://github.com/jaraco/keyring/issues/355#issuecomment-963661996

jaraco commented 2 years ago

Yes, I'd welcome such a change. In addition to devising a protocol for splitting and restoring the password, you'll also need a scheme for naming such passwords... and you'll notice that the Windows Keyring already has a hack to support multiple passwords for the same system with different users, so you'll want to be careful these two naming schemes don't interact badly. You'll also want to consider compatibility - what happens when a user stores a long password and then downgrades keyring... or stores a short one and then upgrades keyring? The upgrade path is more important to consider than the downgrade path.

jrobbins-LiveData commented 2 years ago

I hope to have time this weekend to work on this. I will pay attention to the multiple passwords / different users issue.

Also, I realized I need to pay attention to _delete_password and any other interactions that might need to know about the "sharded" entries.

For the upgrade keyring case, I was thinking that the first "shard" would maintain the TargetName provided by the user. If that is the only "shard", then everything should work the same, and a subsequent downgrade of keyring won't be impacted.

The main breakage I can picture is if one upgraded, stored a password bigger than 1280 (i.e. more than one shard), and then downgraded, older keyring wouldn't know about the subsequent shards and would return a 1280 byte broken password, silently, without any indication that something was wrong.

jrobbins-LiveData commented 2 years ago

@jaraco I tried submitting a Pull Request, but I don't see where it went. I'm inexperienced with git, so likely something I did wrong. In any event, I have a working patch to Windows.py that solved my partifact problem on Windows.

If you can't see my Pull Request, perhaps provide some tips so that I can send it in properly.

jrobbins-LiveData commented 2 years ago

@jaraco, I did some reading and thinking about my sharding strategy, and I have a concern.

First off, as background, I wanted to understand why we see the apparent max password size of 1280 characters.

As an older keyring issue 355 asserted, the Microsoft doc of the "CREDENTIAL" structure says that

CredentialBlobSize

The size, in bytes, of the CredentialBlob member. This member cannot be larger than CRED_MAX_CREDENTIAL_BLOB_SIZE (512) bytes.

However, when I inspect the wincred.h file in a recent Windows SDK, it appears that the Microsoft documentation is wrong.

//
// Maximum size of the CredBlob field (in bytes)
//

#define CRED_MAX_CREDENTIAL_BLOB_SIZE   (5*512)

    DWORD CredentialBlobSize;
    _Field_size_bytes_(CredentialBlobSize) LPBYTE CredentialBlob;

Since 5 * 512 = 2560 bytes, that's room for 1280 2-byte wchar_ts

>>> ffi.sizeof('wchar_t')
2

That's the background for why the byte limit is 2560 bytes.

My concern is that a single unicode character does not have to convert to a single wchar_t. For example, an emoji (using emoji) is a single character,

>>> import emoji
>>> s = emoji.emojize(':thumbs_up:')
>>> s
'👍'
>>> len(s)
1
>>> import cffi
>>> ffi =  cffi.FFI()
>>> ffi.new('wchar_t[]', s)
<cdata 'wchar_t[]' owning 6 bytes>

but (not including the NULL added by ffi.new), maps to two wchar_ts (4 bytes).

The sharding, to be done correctly, must be done on the encoded byte count, not on the character count. In other words, the real limit is 2560 bytes, not 1280 characters.

The obstacle for readily sharding on the byte count is that the API we are using fromdict does the conversion to bytes for us.

Of course for things like JSON Web Tokens (which is the use-case that gave me the issue to begin with), the text in question is actually BASE64 text, and so the ASCII characters "by luck/design" all map one-to-one to a single wchar_t, making my code work correctly. We could call it "good enough", but I had a pang of conscience about that.

Alternatively, lower-level API that let us encode in Windows.py would let us shard correctly.

Please let me know what you think.

jrobbins-LiveData commented 2 years ago

@jaraco, on a separate topic from the sharding...a comment in Windows.py says

Passwords are stored under the service name unless there is a collision
(another password with the same service name but different user name),
in which case the previous password is moved into a compound name:
{username}@{service}

The way I understood that comment, I expected that the check for service name collision would also check for a different user name, but in set_password, the code only checks for service name collision via if existing_pw:

    def set_password(self, service, username, password):
        existing_pw = self._get_password(service)
        if existing_pw:
            # resave the existing password using a compound target
            existing_username = existing_pw['UserName']
            target = self._compound_name(existing_username, service)
            self._set_password(
                target,
                existing_username,
                existing_pw.value,
            )
        self._set_password(service, username, str(password))

and doesn't additionally check if username == existing_username.

Since get_password first checks for the "non-compound service name", I am thinking the code is nevertheless correct. But since it doesn't seem to align 100% with the comment, I wanted to check in about this.

By way of an example

>>> import keyring
>>> keyring.set_password('a_svc', 'a_user', 'aaaa')
>>> keyring.set_password('a_svc', 'a_user', 'bbbb')

produces these two entries in Windows Credentials Manager:

jrobbins-LiveData commented 2 years ago

@jaraco I submitted a new pull request #544. It addresses the Partifact issue that was blocking me. It has a cleaner approach to sharding passwords across multiple entries, and support UTF-8 encoding in, I think, a sustainable way. I apologize in advance for my lack of knowledge about setuptools, packaging, etc. Let me know what you think!

jaraco / keyring

Support longer passwords in Windows #540