PasswordExporter is cleaning paths without respecting cmdclean

tepozoa commented 2 years ago

I don't quite know why this is happening, spent awhile in the pdb debugger and am still scratching my head for the proper fix. I'm not a programmer by trade so kinda hit a wall on this one getting towards what the real fix is, but I can hack it to work.

Problem: I do not want dpaths() or duplicate() to activate; I am converting a Bitwarden to KeepassXC database which is a flat Root (no groups at all for Logins); the source and destination fully support entries in the Root which have the same title. However, the PasswordExporter use of dpaths() is triggering a clean to activate somehow, and duplicate() is just missing the code to respect the flag at all.

manager.py -> PasswordExporter.clean() this logic is fine and works as expected:

        for entry in self.data:
            entry = clean.unused(entry)
            path = clean.group(clean.protocol(entry.pop('group', '')))
            entry['path'] = clean.cpath(entry, path, cmdclean, convert)

However, as soon as we get to the next 3 it starts breaking:

        clean.dpaths(self.data, cmdclean, convert)
        clean.dpaths(self.data, cmdclean, convert)
        clean.duplicate(self.data)

Step 1: changes the path element when there are matching double-entries when i don't want it to Step 2: makes it even worse by taking the step (1) items and double-nesting them Step 3: just ignores my wishes (does not respect cmdclean = False)

Before we get to step 1, I might have 2 entries for github.com that I want in the Root (no group) with different login names; I do not want these "cleaned" into groups (paths) at all, just add two entries to my Root (fully supported by the KDBX format):

title=github.com, user=bob, url=github.com, group=""
title=github.com, user=alice, url=www.github.com, group=""

After step 1 dpaths() I get grouped paths:

path: github.com/bob
path: github.com/alice

Root
  github.com
    bob
    alice

After step 2 dpaths() I get grouped subpaths:

path: github.com/github.com/bob
path: github.com/www.github.com/alice

Root
  github.com
    github.com
      bob
    www.github.com
      alice

If I "fix" step 1 and/or 2 to stop this behaviour (it's as easy as commenting out those two lines of code), then step 3 duplicate() creates the -x index to the title when i don't want it:

github.com bob
github.com-1 alice

What I've done that shows how to fix it is the most basic python in the world - if cmdclean is not set, return and skip the methods.

manager.py change:

        clean.duplicate(self.data, cmdclean)

clean.py changes:

def dpaths(data, cmdclean, conv):
    """Create subfolders for duplicated paths."""
    if not cmdclean:
        return

def duplicate(data, cmdclean):
    """Add number to the remaining duplicated path."""
    if not cmdclean:
        return

I'm sure there's a more correct way to fix this, however this hack works to give me the expected result. I realize that I have to use the --force CLI param to get kdbx.py:insert() to stop complaining about duplicate entries (which in this case is correct, because I'm converting a database into a fresh, empty KDBX file and each duplicate is actually unique in the username field, it's just the same title).

roddhjav commented 2 years ago

The solution is easier that your think: the cmdclean option is not doing what you think it does. It is off by default and only makes the paths more command line friendly by replacing some characters in the path. It has no control over the de-duplication of duplicated path names. This is why duplicate() does not take cmdclean as argument and dpaths() only pass it to cpaths() that needs it.

So adding a new option like --no-deduplication that prevent the run of dpaths() and duplicate() should do the trick (PR are welcome).

tepozoa commented 2 years ago

I follow 100%, it would also be useful then over in kdbx.py KDBX.insert() which I was using --force for, but let me ask you first before I attempt a PR, would it be better if instead KDBX.insert was updated to add the username key for better matching? Maybe:

KDBX.insert():
    ....
    username=entry.pop('login', '')
    if not self.force:
            pkentry = self.keepass.find_entries(title=title, username=username,
                                                group=kpgroup, recursive=False, first=True)

This would cover multi-login needs in multiple ways to avoid false-positives:

title=github.com, login=bob, url=https://github.com
title=github.com, login=alice, url=https://github.com

We could use the new feature flag (if not (self.force OR self.nodedupe)), but maybe it's better to find more specific key combo matches?

roddhjav commented 2 years ago

You are right playing around KDBX.insert(): could be interesting too.

tepozoa commented 2 years ago

Cleaning up/closing this issue as it's a mistaken bug report but not a bug.

roddhjav / pass-import

PasswordExporter is cleaning paths without respecting cmdclean #172