Support for pandas dataframes

graphsense / graphsense-python

A Python client for the GraphSense REST interface.

MIT License

19 stars 3 forks source link

Support for pandas dataframes #8

Closed behas closed 2 years ago

behas commented 3 years ago

The API currently returns data as arrays of JSON objects. On the client side people often work with pandas dataframes and must convert these arrays. Since this is repetitive, it would be great if the API could offer retrieved data optionally also flattened dataframe. Here is. the code I am using for the conversion:

lst = []
cols = ['address', 'total_received', 'balance', 'first_tx', 'last_tx', 'btc_senders', 'btc_recipients']
for a in address_details.values():
    lst.append([a['address'],
                a['total_received']['eur'],
                a['balance']['eur'],
                datetime.utcfromtimestamp(a['first_tx']['timestamp']).strftime('%Y-%m-%d %H:%M:%S'),
                datetime.utcfromtimestamp(a['last_tx']['timestamp']).strftime('%Y-%m-%d %H:%M:%S'),
                a['in_degree'],
                a['out_degree']
               ],    
              )

df1 = pd.DataFrame(lst, columns=cols)

myrho commented 3 years ago

You can create dataframes directly from the CSV endpoints. For instance:

df = pandas.read_csv(api_instance.list_entity_addresses_csv(currency, 1, _preload_content=False))

I'll add examples to the docs, ok?

behas commented 3 years ago

Thanks for clarification; adding an example is certainly a first step. However, I'd argue it is a bit counter-intuitive.

Is it, in general possible, to wrap the low-level automatically generated API into a higher-level API? I'm imagining a single module that mirrors the API and just hides all the details underneath.

behas commented 3 years ago

How would this work if data is retrieved in chunks? This would then 1 df per chunk, right? It is possible to concat df, but that it is not encouraged in pandas because it is slow.

myrho commented 3 years ago

I guess by "hiding details underneath" you mean retrieving large data sets transparently, ie. in a streaming way?

behas commented 3 years ago

streaming in the sense of iterating under the hood and then returning a final combined dataframe. I will put the procedures I am using into separate module and then we can discuss API usability issues.

myrho commented 3 years ago

Actually, the csv endpoints stream the complete data already, hence no pages. So above code will give you all entity addresses in one DF.

myrho commented 2 years ago

obsolete due to new bulk endpoint