keepassium / KeePassium

KeePass-compatible password manager for iOS
https://keepassium.com
Other
1.21k stars 103 forks source link

Similarity calculation issue #201

Closed waynezhang closed 2 years ago

waynezhang commented 3 years ago

Description

KeePassium can not get perfect match for sites below in the same time.

How to reproduce

This also happens vice versa.

Expected behavior

KeePassium can find the right entry for both sites

Screenshots

N/A

Environment:

Additional context

N/A

keepassium commented 3 years ago

KeePassium can not find the right entry.

Hmm, I cannot reproduce this… What am I missing?

IMG_1146 IMG_1145

waynezhang commented 3 years ago

I think it should be perfect match instead of Related Entires?

I have an entry with https://www.rakuten-sec.co.jp/smartphone/login.html like this:

When I access https://www.rakuten-sec.co.jp/smartphone/login.html it has a perfect match

But for https://www.rakuten-sec.co.jp it matches all the entires (not sure if it's all of them exactly but it's most of them for sure) without a perfect match

keepassium commented 3 years ago

I think it should be perfect match instead of Related Entires?

Unfortunately, no, because some websites need very different credentials on the same domain. For example, Apple Store checkout page:

The difference is just a small fr in the path, but this is a different country where you would probably want a dedicated Apple ID with a different credit card and different address.

That's why https://www.rakuten-sec.co.jp/smartphone/login.html and https://www.rakuten-sec.co.jp are considered as related, but not a perfect match. It is just impossible to predict whether these URLs are equivalent or they require two separate entries.

But for https://www.rakuten-sec.co.jp it matches all the entires (not sure if it's all of them exactly but it's most of them for sure) without a perfect match

"Without a perfect match" is explained by the above (page URL does not match exactly entry's URL). The relevant entry is listed somewhere among the others.

However, all the other entries don't seem relevant. I guess all of them include some part of the original URL? Maybe an email that ends with .co.jp? Or word login in the notes? (They seem to be in a group named Login, but group name is not used for search.)

waynezhang commented 3 years ago

The difference is just a small fr in the path, but this is a different country where you would probably want a dedicated Apple ID with a different credit card and different address.

I see. This makes sense. But if we can not treat the entires with same domains as exact match, is there any way to reduce the noise?

I guess all of them include some part of the original URL? Maybe an email that ends with .co.jp? Or word login in the notes?

I tried to debug this and found it was caused by these cases:

There are too many noise cases here. Is there any way or plan to improve this?

keepassium commented 3 years ago

Sorry for the delay.

I agree there are too many results in your screenshot that appear irrelevant. After looking into the ranking algorithm, I have found the problem. One of the secondary search criteria derived from the page's URL is the second-level domain name. (E.g. for google.com that would be google) This way, KeePassium can find entries even if they only mention "google" or "amazon" anywhere at all.

This works well with most TLDs like .com, .org, .it, etc: the second-level domain is the service name. However, some countries use ccSLDs (country-code second level domains) like .co.jp, .co.uk, etc. There, the second-level domain is just co which is easy to find in many entries.

So it looks like I will need to account for ccSLDs and use the third-level domain name for them. The full list is quite extensive, but perhaps even a handful of exceptions would significantly reduce the noise. I will look into it.

waynezhang commented 3 years ago

Thanks! Great to hear this!

So it looks like I will need to account for ccSLDs and use the third-level domain name for them. The full list is quite extensive, but perhaps even a handful of exceptions would significantly reduce the noise. I will look into it.

Yeah even only support some common ones of them should be great!