nucypher / nucypher-monitor

NuCypher network intelligence crawler and web dashboard
7 stars 15 forks source link

Optimization of work done by the Crawler/Dashboard #84

Open derekpierre opened 3 years ago

derekpierre commented 3 years ago

Functionality like this, which projects stakes and stakers, can be problematic when the network is as large as it is:

    @collector(label="Projected Stake and Stakers")
    def _measure_future_locked_tokens(self, periods: int = 365):
        period_range = range(1, periods + 1)
        token_counter = dict()
        for day in period_range:
            tokens, stakers = self.staking_agent.get_all_active_stakers(periods=day, pagination_size=200)
            token_counter[day] = (float(NU.from_nunits(tokens).to_tokens()), len(stakers))
        return dict(token_counter)

The code effectively projects out the stakes and stakers for the network over the next year. For each day in the next year, it gets information on all of the active stakes. When the network is large this is exceptionally computationally expensive.

Two questions arise:

  1. Do we need to show this functionality? (commented out in https://github.com/derekpierre/nucypher-monitor/tree/overloaded branch)

By simply commenting it out it has improved our data collection round time to about 1 minute (down from 15minutes sometimes 😱 ) and improved stability; node count was about 1800 nodes.

crawler_1   | Scraping Round #613 ========================
crawler_1   | ✓ ... Current Period
crawler_1   | ✓ ... Date/Time of Next Period [0s]
crawler_1   | ✓ ... Latest Teacher [0s]
crawler_1   | ✓ ... Previous Fleet States [0s]
crawler_1   | ✓ ... Network Event Details [0s]
crawler_1   | ✓ ... Known Node Details [0s]
crawler_1   | ✓ ... Known Nodes [31.0s]
crawler_1   | ✓ ... Staker Confirmation Status [31.0s]
crawler_1   | ✓ ... Global Network Locked Tokens
crawler_1   | ✓ ... Top Stakes [0s]
crawler_1   | Scraping round completed (duration 0:01:02).
  1. If we do want to keep that functionality, how can that functionality be optimized.
cygnusv commented 3 years ago

I think we don't need this functionality, or more accurately, we don't need it with the same frequency than the main crawler loop (mainly discovery loop stuff). For this specific case, a secondary loop that runs daily would be more than enough.