graphprotocol / graph-node

Graph Node indexes data from blockchains such as Ethereum and serves it over GraphQL
https://thegraph.com
Apache License 2.0
2.9k stars 968 forks source link

Allow subgraphs to be coiped/shared between indexers #3730

Open tilacog opened 2 years ago

tilacog commented 2 years ago

At first glance, this feature would operate similarly to graphman copy but subgraphs could be sent to a foreign cluster.

The motivation behind this encompasses both:

Alternatively, an import/export pattern similar to pg_dump would also fit those needs.

chriswessels commented 2 years ago

Worth noting that there are risks here. If the default pattern becomes for indexers to download existing state, then we have far less actors actively validating that state's integrity and provenance. This would also impact POI consensus assumptions.

azf20 commented 2 years ago

I think @lutter has looked at the mechanics of copying a given subgraph previously, and if I recall correctly there were some potential challenges.

@chriswessels there are certainly those considerations, but I think the scope of this issue should focus on how such a transfer might be implemented in Graph Node, to transfer to a foreign db - those other things warrant discussion in a GIP / in the forum, and have cascading impact on other components (e.g. indexer components)

lutter commented 2 years ago

The main headache with import/export is defining a good format for transferring the data; it would be desirable to make that format usable for both dump/restore and for substreams. That format should be db agnostic, and live at the level of the subgraph schema, not the current db schema.

In my mind, the first order of business for this though is to address the network concerns around import/export; that might be as simple as "we don't care from the network side, we'll just treat this the same as people sharing SQL dumps" or more involved.

github-actions[bot] commented 1 year ago

Looks like this issue has been open for 6 months with no activity. Is it still relevant? If not, please remember to close it.

air3ijai commented 1 year ago

Is there a way to copy Subgraph from one indexer to another one, to bootstrap the indexing process? Can we do it somehow directly via DB or using a graphman?

Otherwise it is about a long time of indexing and load on RPC endpoint.

cryptovestor21 commented 2 months ago

Bumping this. There is value to an operation like this although I understand it creates a shortcut for doing and verifying the sync work. I need this functionality to migrate very large subgraphs to a new database and integrate them correctly into the new primary shard. Below is the content of the feature request I was going to post before finding this existing request.

Context:

I asked this question on Discord:

I am building a new stack, and I want to at least copy the very largest subs I already have into a new database because they would take months to sync again. What is the most effective way to do this. Main restriction is the old database must die, it's nearly 4 years old and I am starting new. Dunno if its relevant but old database didn't have the right locale, whereas the new one does.

This is a pretty common use case we have discussed a few times on Indexer Office Hours. Vincent from Data Nexus has outlined how you would manually achieve the above - interested to have a feature in graphman that extend a copy operation to achieve this if folks think it is useful to a wider audience and if it's a realistic ask:

Unfortunately that will require some DB surgery. If you're going to do a sharded DB, you should be able to create a new shard with the correct locale, copy the data to the new shard using graphman copy, then you'll have to input the new shard config into the new stack graph node config, manually insert deployment details into the primary, and then set the counter in the primary to one higher than your largest sgd number. We really need a graphman command for dumping subgraph data so it can be imported into another disconnected DB...