SDFD-6_dex_data - Githubissues

Selhar commented 2 years ago

Hey, this will be a huge PR, sorry for the massive amounts of text, there are too many details and i'd rather have it all in one place to make it easier to catch any mistakes.

New endpoints

GET /api/v2/dex/24h-payments: Integer GET /api/v2/dex/24h-trades: Object:

{
  trades_last_24h: Integer,
  change: String,
  overall: Integer
}

GET /api/v2/dex/unique-assets: Integer GET /api/v2/dex/active-accounts : Integer GET /api/v2/dex/volume: Object:

{
  volume: Integer,
  change: String,
  overall: Integer
}

GET /api/v2/dex/all: Object:

{
  volume: volume endpoint object
  uniqueAssets: integer
  tradesLast24h: trades endpoint object
  uniqueAssets: integer
  activeAccounts: integer
}

Example output (volume is completely wrong because we can't get a price for all assets):

Queries being used

Get asset price

https://api.stellar.expert/explorer/public/asset/${asset}/stats-history

Get volume

This one is very.. very tricky. There are 3 queries here, one is supposed to fetch the results for the last 24h, another for the 48h and a third one for the overall data. All of these queries return the same amount of rows, 27262. I'm not sure if i'm missing something but it's as if the filter is being ignored completely. Here are the queries:

24h:

      WITH bought_trades AS
    (
        SELECT SUM(buying_amount) AS summed_amount, CONCAT(buying_asset_code, "-", buying_asset_issuer) AS asset_name,
        FROM crypto-stellar.crypto_stellar_2.history_trades
        WHERE selling_liquidity_pool_id IS NULL OR selling_liquidity_pool_id = '' 

    AND ledger_closed_at >= TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -24 HOUR)
    AND batch_run_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 2 DAY) 

        GROUP BY asset_name
            UNION ALL
        SELECT SUM(buying_amount) AS summed_amount, "native" as asset_name
        from crypto-stellar.crypto_stellar_2.history_trades
        WHERE selling_liquidity_pool_id IS NULL OR selling_liquidity_pool_id = '' 
        AND buying_asset_TYPE = "native" 

    AND ledger_closed_at >= TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -24 HOUR)
    AND batch_run_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 2 DAY) 

        GROUP BY buying_asset_type
    )
    SELECT * FROM bought_trades;

48h:

      WITH bought_trades AS
    (
        SELECT SUM(buying_amount) AS summed_amount, CONCAT(buying_asset_code, "-", buying_asset_issuer) AS asset_name,
        FROM crypto-stellar.crypto_stellar_2.history_trades
        WHERE selling_liquidity_pool_id IS NULL OR selling_liquidity_pool_id = '' 

    AND ledger_closed_at >= TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -48 HOUR)
    AND batch_run_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 3 DAY) 

        GROUP BY asset_name
            UNION ALL
        SELECT SUM(buying_amount) AS summed_amount, "native" as asset_name
        from crypto-stellar.crypto_stellar_2.history_trades
        WHERE selling_liquidity_pool_id IS NULL OR selling_liquidity_pool_id = '' 
        AND buying_asset_TYPE = "native" 

    AND ledger_closed_at >= TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -48 HOUR)
    AND batch_run_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 3 DAY) 

        GROUP BY buying_asset_type
    )
    SELECT * FROM bought_trades;

Overall:

      WITH bought_trades AS
    (
        SELECT SUM(buying_amount) AS summed_amount, CONCAT(buying_asset_code, "-", buying_asset_issuer) AS asset_name,
        FROM crypto-stellar.crypto_stellar_2.history_trades
        WHERE selling_liquidity_pool_id IS NULL OR selling_liquidity_pool_id = '' 

        GROUP BY asset_name
            UNION ALL
        SELECT SUM(buying_amount) AS summed_amount, "native" as asset_name
        from crypto-stellar.crypto_stellar_2.history_trades
        WHERE selling_liquidity_pool_id IS NULL OR selling_liquidity_pool_id = '' 
        AND buying_asset_TYPE = "native" 

        GROUP BY buying_asset_type
    )
    SELECT * FROM bought_trades;

By the way, for each of these queries i must call the stellar expert endpoint to get the price for each row. So right now there are 27262 * 3 stellar expert requests (after a while i just get blocked from making them, so the code is currently limiting it to 100 queries max, i left a TODO in there while we figure it out).

Trades query

24h:

    SELECT count(*) as data 
    FROM crypto-stellar.crypto_stellar_2.history_trades

    WHERE ledger_closed_at >= TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -24 HOUR) 
    AND batch_run_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 2 DAY)

48h:

    SELECT count(*) as data 
    FROM crypto-stellar.crypto_stellar_2.history_trades

    WHERE ledger_closed_at >= TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -48 HOUR) 
    AND batch_run_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 3 DAY)

Overall:

    SELECT count(*) as data 
    FROM crypto-stellar.crypto_stellar_2.history_trades

Payments

    SELECT count(*) as data 
    FROM crypto-stellar.crypto_stellar_2.enriched_history_operations 
    WHERE type in (1, 2, 13) 
    AND closed_at >= TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -24 HOUR) 
    AND successful is true

Unique assets

    with all_assets as (
      select asset_code, asset_issuer, asset_type
      from crypto-stellar.crypto_stellar_2.history_assets
      group by asset_code, asset_issuer, asset_type
    )
    select count(distinct concat(aa.asset_code, ":", aa.asset_issuer)) as data
    from all_assets aa
    left outer join crypto-stellar.crypto_stellar_2.trust_lines tl
      on tl.asset_code = aa.asset_code
      and tl.asset_issuer = aa.asset_issuer
    where tl.asset_code is not null;

Active accounts

    SELECT COUNT(DISTINCT(op_source_account))
    FROM crypto-stellar.crypto_stellar_2.enriched_history_operations 
    where type in (1, 2, 6, 13)
    AND closed_at >= TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -24 HOUR);

Minor changes

The tests now only run when changes are made to /commons or /backend.
Created a folder structure for dex project. I tried to stay on a dex.ts file however it was getting extremely unwieldy. Let me know if a single file or another file structure is better.
For typescrypt support, i just developed it using strict setting, and then removed it so other files wouldn't break. Let me know if a specific TS rule is more appropriate (such as not having any's or unknowns).
I just realized the endpoints naming convention is all over the place, let me know the ideal convention @italomlp @acharb
When making dex volume queries, i noticed that the results don't seem to be filtered by 24hrs. Fetching 24h, 48h or overall returns the same amount of data.

Important points of attention

I'm unsure how redis data is being cleaned out in production, currently our redis cache is just storing it there until it's manually removed. Do we have a CRON running or something on our production servers? If not, let me know and i'll clean the cache on a 24h basis on express.
When fetching prices from stellar.expert the queries can take an unreasonable amount of time (after around 50 minutes i gave up testing locally). I'm not sure how long is reasonable for this cache to be generated but there are just too many assets to calculate, and each asset requires a request *2 ( for 24 an 48h). Right now with current bigquery results, i would need to make approximately 70.000 requests.
When fetching prices for an asset, sometimes i get various connection errors, such as this can create inconsistent results. Not sure what we can do on that end however but i thought it was important to report it. Once an error like that happens we treat the price as 0 and essentially remove the asset from the sum, since we have no basis to calculate it.
Prices are currently not being cached, since the results are inconsistent, i thought i'd be better to do that after we fix that. As soon as we reach a stable point i can push a new commit storing the price data.
Also on fetching prices, should we ignore the "native" assets from the sum?
Currently we are unable to test the DEX volumes endpoint, it would take potentially hours and make dozens of thousands of requests for stellar expert per run.

stellar-jenkins commented 2 years ago

Preview is available here:
https://dashboard-pr272.prototypes.kube001.services.stellar-ops.com/

acharb commented 2 years ago

I'm unsure how redis data is being cleaned out in production

right now we're not clearing redis but overwriting based on a schedule. (see here, this triggers the lumens cache updates every 10 mins)

I think for these 24HR redis updates, we should clear the redis on a schedule though. We could add a github action schedule workflow? That way we can say that the data is updated at everyday at a certain time, as opposed to when the servers re-deploy. I'd have to think more about best way of doing that in the code 🤔 , unless you have ideas how to do

acharb commented 2 years ago

When making dex volume queries, i noticed that the results don't seem to be filtered by 24hrs. Fetching 24h, 48h or overall returns the same amount of data.

let's remove the selling_liquidity_pool_id = '' clause, it's what's messing up the query (tbh not sure why, BQ bug?). But I checked and there aren't any entries w/ selling_liquidity_pool_id = ''.

So the dex volume queries should now look like: (also let's use the batch_run_date for time partitioning)


...
SELECT SUM(buying_amount) AS summed_amount, CONCAT(buying_asset_code, ":", buying_asset_issuer) AS asset_name,
FROM `crypto-stellar.crypto_stellar_2.history_trades`
    WHERE selling_liquidity_pool_id IS NULL AND batch_run_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY) 

...

Selhar commented 2 years ago

I think for these 24HR redis updates, we should clear the redis on a schedule though. We could add a github action schedule workflow? That way we can say that the data is updated at everyday at a certain time, as opposed to when the servers re-deploy. I'd have to think more about best way of doing that in the code thinking , unless you have ideas how to do

I think the best solution depends on how this backend will be hosted really. For a 24h cycle i don't think express is the best option for us because it can be unreliable and difficult to track in case of failures such as a server restart/deploy or errors. Using a CRON would allow us to make consistent calls directly to redis to clear cache on demand and have logs in case of a failure. But that really depends on how much access we have to the redis service and how it's hosted. If we're running a docker compose somewhere, i might be able to create a service that mimicks a CRON job, but it won't be as good as the real thing. As for the github action i don't think that's ideal unless we don't have any other means to access the server (i've been in a similar situation where github actions triggered a cache invalidation and it was very unreliable, but it can work).

By the way, regarding the errors on stellar.prices, is there something i can do on my end to solve the throttling errors? Or is that something that won't happen in production? The new volume calls greatly decreased the runtime but it's still quite a long time for a cache refresh (since we still have to make the big overall query).

acharb commented 2 years ago

Good points, I forgot we have a couple cron jobs already set up using kubernetes. That's probably the best route here as well. I created a ticket to track.

Our kubernetes repo is an internal one I believe, so I think this is something an SDF person will have to do: https://github.com/stellar/dashboard/issues/277

acharb commented 2 years ago

By the way, regarding the errors on stellar.prices, is there something i can do on my end to solve the throttling errors?

Once we add the production API key it may resolve. We have other projects using the API key that are making similar calls. We'll have to add that internally.

Created a ticket to track: https://github.com/stellar/dashboard/issues/278

sydneynotthecity commented 2 years ago

One item that you may want to consider for DEX trading volume is to narrow down the list of assets. You could create a heuristic where the asset must have had N number of trades executed in 24 hours or a particular number of trustlines established before it should be part of the pricing pipeline. There are a lot of junk assets on the network and you don't want to unnecessarily assign them value. We're having discussions with /path-payments about similar ideas to narrow down the list of assets to make these calls more efficient.

Selhar commented 2 years ago

There are a lot of junk assets on the network and you don't want to unnecessarily assign them value.

I'm making optimizations to the prices caching right now to make it easier when redis is cleared. If there are any thresholds i can use to reduce the amount of queries to stellar.expert, it'd be greatly appreaciated. Even with the improved dex volume queries it takes >30 minutes to make all requests and even then more than 1/3 of them fail. I don't really have a parameter for what is an acceptable threshold for volumes so i think it's best to leave for you folks.

Selhar commented 2 years ago

Changes committed. Some important attention points though.

Added a cache for the volume calculation, we have a 24h and 48h cache. Once the 48h cache is cleared, the 24h is moved to the 48h slot and the 24hrs cache is renewed. Let me know if this logic is sound, i made it thinking of a CRON that will clean the 48hrs cache every day.
Added a clause to the getOrThrow function to allow it not to throw on some edge cases. I thought of changing the name or creating another function but it seemed unnecessary.
We no longer store the volume query result on redis, it costs a lot of memory and that data was only used to calculate the volume, now we only store the calculated volume itself. Let me know if the query result is necessary in some context.
Even limiting the amount of stellar.expert queries to 300 (100 first assets for 24h, 48 and overall) the request and first cache takes 3 minutes to be created. 24hrs of assets results in around 6000 items so a complete cache would possibly take hours to be created, this makes the volumes endpoint impossible to test meaningfully. I've removed the limit to 100 first results so it's technically ready for deploy, but we should keep an eye on how long this cache takes to be generated and how much cost it creates on the stellar.expert servers.
Updated equation for increase from the previous day, results are strange so i'll update them once we have an agreement on what they should represent.

Our kubernetes repo is an internal one I believe, so I think this is something an SDF person will have to do:

Here are the keys you'll need to clear on kubernetes on a 24h basis. Ideally we should not deploy before this is in place, otherwise we'll always get the cached values. dex-volume-sum-48h dex-volume-sum-overall payments24h dex-trades24h dex-trades48h dex-trades-overall dex-uniqueAssets dex-activeAccounts

stellar-jenkins commented 2 years ago

Preview is available here:
https://dashboard-pr272.prototypes.kube001.services.stellar-ops.com/

Selhar commented 2 years ago

Hey, my tests aren't running, they usually fail whenever i run something that requests data from bigquery. Maybe we're missing the new environment variable on github actions @acharb, BQ_PROJECT_ID more specifically. (sorry, left the comment on the issue instead of the PR :facepalm: )

stellar-jenkins commented 2 years ago

Preview is available here:
https://dashboard-pr272.prototypes.kube001.services.stellar-ops.com/

stellar-jenkins commented 2 years ago

Preview is available here:
https://dashboard-pr272.prototypes.kube001.services.stellar-ops.com/

Selhar commented 2 years ago

Hey, just a summary of the last things pending on this task:

"native" assets are not included in the volume sum, should they be ignored (price = 0) or is there an asset we can use to calculate their volume?
Will we add tests that make queries to bigquery? If not i can remove the ones i added in this task. Otherwise i think we're missing the environment variables.

Also if there are any changes to the code you folks would like to make before we merge, just let me know. In the meantime i'll be working on the ledgers task.

acharb commented 2 years ago

"native" assets are not included in the volume sum, should they be ignored (price = 0) or is there an asset we can use to calculate their volume?

Don't think I'm understanding 🤔 . The native asset (there is only 1 native asset, the XLM) is included in the BQ query, the subquery that starts with:

SELECT SUM(buying_amount) AS summed_amount, "native" as asset_name

Since the stellar.expert prices are in terms of XLM, we can just add this sum to the asset sums converted to XLM.

Will we add tests that make queries to bigquery? If not i can remove the ones i added in this task. Otherwise i think we're missing the environment variables.

ya I don't think we need to have a test for connecting to BigQuery atm. We can add later if needed. Though we should keep the tests that are testing the routes, to ensure those don't change accidentally. You can hardcode the data in redis for those endpoints so that the backend calls don't go to big query

acharb commented 2 years ago

@Selhar let's remove the dex volume code (ie the sum_volume and get_volume_data methods). See slack for more info: https://stellarfoundation.slack.com/archives/C023RFQKL85/p1654126069606199 tldr we'd like to do it not using stellar.expert, which we can't do atm

Selhar commented 2 years ago

Hey just as an update, i'm finishing up the ledgers script and will close up the final changes we need for dex after that. Since it's a fairly complex script i'd rather get it done asap. Should be done later today.

Selhar commented 2 years ago

Removed the calls to the volume endpoint but left the function. In the future we should be able to use it, only changing the datasource from stellar.expert to the returned data itself. Let me know if it'd be better to remove the function entirely.
Added mocked redis data for tests.

stellar-jenkins commented 2 years ago

Preview is available here:
https://dashboard-pr272.prototypes.kube001.services.stellar-ops.com/

stellar-jenkins commented 2 years ago

Preview is available here:
https://dashboard-pr272.prototypes.kube001.services.stellar-ops.com/

stellar / dashboard

SDFD-6_dex_data #272

New endpoints

Queries being used

Get asset price

Get volume

Trades query

Payments

Unique assets

Active accounts

Minor changes

Important points of attention