Term checking does not factor in leading 'k' - parity with Kristvanity

hugeblank commented 1 year ago

Unlike KristVanity, pmkam seems to skip out on factoring in the leading k. If I were searching for the term kawaii, I'd have to pass in awaii which increases the amount of junk in the solutions by a factor of 36.

KristVanity: pmkam:

hugeblank commented 1 year ago

oh this problem goes further! addresses are only compared directly after the k rather than anywhere in the address string. Fortunately that's not much of a problem for me, but it is a little bit goofy, and definitely does not reflect KristVanity's approach.

migeyel commented 1 year ago

This is deliberate:

A miner for finding Krist addresses with a desired prefix (or group of prefixes).

That mainly boils down to the fact that pmkam was built around my search for the kpg231000 prefix, so searching for anything that wasn't a prefix didn't make sense at the time. Support for multiple terms was secondary back when it was first implemented.

The project in general hasn't seen much activity in the last year, and given the effort needed to bring these changes to effect, I wouldn't bet on anyone actually getting to do it. That being said, I can't see a reason why it should be ruled out as a possibility. Here are some things to take into consideration should that happen:

Character Generation Complexity

Here is the relative complexity of generating a character based on its relative position along the address.	Character Position	Average Complexity
1	1.0
2	1.125
3	1.285
4	1.5
5	1.8
6	2.25
7	3.0
8	4.5
9	9.0

As you can see, higher positions are more expensive to check. But this does not take into account the complexity of advancing the hash chain, which could be several times more expensive. A more careful analysis would be needed to see if there is a break-even point where checking more characters is worse than shifting and starting a new search. This analysis could also take into account that prefixes are generally more desirable to find than general substrings.

Data Structure

The current implementation represents the term set using a trie in order to check for matches in constant time per output character. This isn't enough for substring search. Instead, the term set would need to be packed into an automaton, which gets stepped at each generated character.

Disfavoring Longer Terms

Let's say that you search for substrings "hello" and "pg231000". Let's say that the trial address gets generated to the point where it is "ka0???????". The long term can clearly no longer be matched, there aren't enough characters left. But the short term can. If you keep the search going, then, from the long term's perspective, you are throwing computation away since the search won't yield it. Should you stop now or should you keep going?

migeyel / pmkam