Update indexer environment to support mainnet and redwood

ryanchristo commented 1 year ago

Ref: https://github.com/regen-network/groups-ui/pull/112#issuecomment-1662849871

As a followup to #112, we should improve our QA testing setup and deployment pipeline to make indexed proposals on both mainnet and redwood available in both staging and production environments. Currently we have a single environment variable for setting the graphql endpoint that we set separate in staging and production environments.

The groups ui was designed to switch between networks and we should make sure we have full support for indexed proposals (endpoints in place at least) within the groups ui. If we add networks and those networks require a different endpoint (someone else is hosting the indexed proposals), the groups ui should be configurable to support an endpoint specific to the network.

blushi commented 1 year ago

Hey team! Please add your planning poker estimate with Zenhub @ryanchristo @wgwz

wgwz commented 1 year ago

I was thinking about this some more and I think we can do it in a way where groups-ui can still use the single graphql client connected to the indexer API. The indexer database model was setup so that it can support indexing multiple chains.

I think with some adjustments to the latest code in the indexer, we can having the production indexer, index mainnet and redwood. Important: I want to say it again, we will balloon the size of our database significantly by doing this. But if we are OK with that, then I think this is a good path to go.

We will need to adjust this PollingProcess classes run method to accept arguments for rpc and api endpoints: https://github.com/regen-network/indexer/blob/main/utils.py#L66

Which in turn would get passed to: https://github.com/regen-network/indexer/blob/main/utils.py#L73C23-L73C34

Currently the indexer only instantiates one PollingProcess per indexing task. I.e.

For the main indexing process which indexes all blocks: https://github.com/regen-network/indexer/blob/main/index_blocks.py#L108-L110
For the retirements indexing process: https://github.com/regen-network/indexer/blob/main/index_retires.py#L77-L80
For the proposals indexing process: https://github.com/regen-network/indexer/blob/main/index_proposals.py#L116-L119

Instead of having one PollingProcess per indexing task, we can have multiple. There would be one PollingProcess per indexing process per chain. We could use a new database table that configures this:

                               Table "public.config_chain"
   Column   |           Type           | Collation | Nullable |      Default       
------------+--------------------------+-----------+----------+--------------------
 name       | text                     |           |          |  
 rpc_url    | text                     |           |          | 
 api_url     | text                     |           |          |

Each row would represent a chain that we want the indexer to run against. Then at each of the call sites for PollingProcess we could instead have a loop that instantiates one per chain.

The reason this could work nicely is because the indexer data model uses a unique id called chain_num all throughout. So for example, in production, we want to be able to toggle between redwood and mainnet, and as a result see the historical proposals in each network.

Screen Shot 2023-08-08 at 5.46.37 PM.jpg

As shown above, the allProposals query has the chainNum field available. So we can just a write query that says, "give me all proposals where the chain number is redwoods chain number". If it's not already indexed, we'll need to add an index on the proposals table's chain_num column.

This means that we wouldn't have to instantiate a whole new graphql client each time we toggle the chain in the groups-ui. And also we would not strictly need to keep the staging version of the indexer database up and running for production groups-ui to work correctly. It's a single database this way. And it also scales better if we need to add other chains, since we would basically just need to add a row in the public.config_chain table or whatever configuration method we choose.

Thoughts?

ryanchristo commented 1 year ago

I agree with running a process for redwood and mainnet using the same indexer deployment and database. It was designed for this purpose and we can prune indexed redwood state as needed to avoid unnecessary storage expenses.

I guess we would use the same URL in production for both mainnet and redwood and therefore may only need one variable for the two but maybe we should consider adding REGEN to the variable name to make it more clear that this is the endpoint for regen mainnet and redwood testnet, which would leave room for other endpoints if and when another network is added.

I think we still want to prepare for multiple graphql endpoints in the case where a network is added and we are not hosting the data but someone from that network is (and they use our indexer to run a process specifically for their chain, making it available within our deployment of the groups ui).

From the groups-ui perspective, we just need to make sure we maintain support multiple networks, which is how the groups-ui was designed, and whether we use the same indexer deployment and database for both regen mainnet and redwood testnet is something we can further discuss but maybe to start we can continue towards updating the configuration.

ryanchristo commented 1 year ago

I'll take an initial crack at what I was originally thinking and then maybe we can further discuss.

ryanchristo commented 1 year ago

Rough idea is there: https://github.com/regen-network/groups-ui/pull/139

We could consolidate the regen environment variable (or maybe better to leave separate so we have options when testing) but the general idea is there. This way we continue leveraging chain information specific to each chain.

ryanchristo commented 1 year ago

We might even want to consider removing the environment variables if they are the same for each environment (i.e. local, staging, and production). In some cases you may still want to swap them out though so maybe we could have defaults in place in the code so we don't have to worry about setting them correctly in staging and production. Thinking out loud a bit.

ryanchristo commented 1 year ago

@wgwz whatever we decide here, let's open an issue in the indexer to discuss the adjustments you mentioned above. I think this is the right direction for managing indexing for regen mainnet and redwood testnet.

wgwz commented 1 year ago

@ryanchristo the approach in your PR looks good to me so far, it conceptually it makes sense how you're going about it and why. no strong opinion about having the environment variables or not since we'll effectively always have these endpoints configured in code with this approach (unless i'm misunderstanding). open to seeing those cleaned up however you think it makes sense.

regen-network / groups-ui

Update indexer environment to support mainnet and redwood #131