keis / base58

Base58 and Base58Check implementation compatible with what is used by the bitcoin network.
MIT License
180 stars 59 forks source link

performance issue with str.index() inside of the loop #52

Closed kolomenkin closed 4 years ago

kolomenkin commented 4 years ago

The following code has very bad performance for long encoded strings:

It is better to use pre-calculated array to map any possible character to its index in original alphabet.

Original code:

    for char in v:
        decimal = decimal * 58 + alphabet.index(char)

Possible solution:

    import array
    map = array.array('b', [alphabet.find(chr(x)) for x in range(256)])
    for char in v:
        decimal = decimal * 58 + map[ord(char)]

My solution is not ideal for short strings. It is a good idea to have mapping to be pre-calculated for each alphabet. And original solution was raising exception in case of unexpected character found. New solution will not raise exception but it may be fixed.

keis commented 4 years ago

base58 is not advisable on big strings to begin with but I'm not opposed to performance improvements. PRs are welcome especially if they include benchmark numbers! :)

Regrading pre-calculating I think the easiest solution is to stick this in a function like get_map with @lru_cache because the alphabet can be user specified.

kolomenkin commented 4 years ago

Please look into this draft PR: https://github.com/keis/base58/pull/53