graphsense / graphsense-REST

A REST service for accessing cryptocurrency data stored in Apache Cassandra.
MIT License
10 stars 9 forks source link

Improve speed of bech32 address listing #41

Closed MatteoRomiti closed 3 years ago

MatteoRomiti commented 4 years ago

When searching for a bech32 address from the landing page, it takes a long time to load/suggest. There is a TODO already here https://github.com/graphsense/graphsense-REST/blob/master/gsrest/service/addresses_service.py#L180 but maybe the problem should be solved in the transformation.

myrho commented 3 years ago

The loop is implemented but what actually are bech32 adresses?

MatteoRomiti commented 3 years ago

BTC addresses starting with bc1... https://en.bitcoin.it/wiki/Bech32

behas commented 3 years ago
defconst commented 3 years ago

Spark transformation runtime measurements:

Keyspace Removed Prefix Runtime
btc_transformed_20210202 no prefix 18.4 h
btc_transformed_20210202_prefix_bc bc 18.1 h
btc_transformed_20210202_prefix_bc1 bc1 19.4 h
defconst commented 3 years ago

Query timings:

prefix: ''
{'currencies': [{'addresses': ['bc1qpcwftvxa9585990xpj93wnmu8x672utr4nsxu3'],
                 'currency': 'btc',
                 'txs': []}],
 'labels': []}
319.84 seconds
prefix: '-bc'
{'currencies': [{'addresses': ['bc1qpcwftvxa9585990xpj93wnmu8x672utr4nsxu3'],
                 'currency': 'btc-bc',
                 'txs': []}],
 'labels': []}
1.25 seconds
prefix: '-bc1'
{'currencies': [{'addresses': ['bc1qpcwftvxa9585990xpj93wnmu8x672utr4nsxu3'],
                 'currency': 'btc-bc1',
                 'txs': []}],
 'labels': []}
1.18 seconds

Code:

import time
from pprint import pprint
import graphsense

configuration = graphsense.Configuration(
    host="http://localhost:9000"
)

with graphsense.ApiClient(configuration) as api_client:
    api_instance = graphsense.GeneralApi(api_client)
    limit = 10
    q = 'bc1qpcwftvx'

    for elem in ['', '-bc', '-bc1']:
        print("prefix: '%s'" % elem)
        start = time.time()
        currency = 'btc' + elem
        api_response = api_instance.search(q, currency=currency, limit=limit)
        pprint(api_response)
        end = time.time()
        print("%.2f seconds" % (end - start))

Config:

database:
    driver: cassandra
    nodes: ["192.168.243.101"]
    tagpacks: "tagpacks_prod"
    currencies:
        btc:
            raw: "btc_raw_prod"
            transformed: "btc_transformed_20210202"
        btc-bc:
            raw: "btc_raw_prod"
            transformed: "btc_transformed_20210202_prefix_bc"
        btc-bc1:
            raw: "btc_raw_prod"
            transformed: "btc_transformed_20210202_prefix_bc1"

ALLOWED_ORIGINS:
   - "http://localhost(:.+)?"
   - "^https://.+.graphsense.info$"