Open tilacog opened 2 years ago
Worth noting that there are risks here. If the default pattern becomes for indexers to download existing state, then we have far less actors actively validating that state's integrity and provenance. This would also impact POI consensus assumptions.
I think @lutter has looked at the mechanics of copying a given subgraph previously, and if I recall correctly there were some potential challenges.
@chriswessels there are certainly those considerations, but I think the scope of this issue should focus on how such a transfer might be implemented in Graph Node, to transfer to a foreign db - those other things warrant discussion in a GIP / in the forum, and have cascading impact on other components (e.g. indexer components)
The main headache with import/export is defining a good format for transferring the data; it would be desirable to make that format usable for both dump/restore and for substreams. That format should be db agnostic, and live at the level of the subgraph schema, not the current db schema.
In my mind, the first order of business for this though is to address the network concerns around import/export; that might be as simple as "we don't care from the network side, we'll just treat this the same as people sharing SQL dumps" or more involved.
Looks like this issue has been open for 6 months with no activity. Is it still relevant? If not, please remember to close it.
Is there a way to copy Subgraph from one indexer to another one, to bootstrap the indexing process? Can we do it somehow directly via DB or using a graphman?
Otherwise it is about a long time of indexing and load on RPC endpoint.
Bumping this. There is value to an operation like this although I understand it creates a shortcut for doing and verifying the sync work. I need this functionality to migrate very large subgraphs to a new database and integrate them correctly into the new primary shard. Below is the content of the feature request I was going to post before finding this existing request.
Context:
I asked this question on Discord:
I am building a new stack, and I want to at least copy the very largest subs I already have into a new database because they would take months to sync again. What is the most effective way to do this. Main restriction is the old database must die, it's nearly 4 years old and I am starting new. Dunno if its relevant but old database didn't have the right locale, whereas the new one does.
This is a pretty common use case we have discussed a few times on Indexer Office Hours. Vincent from Data Nexus has outlined how you would manually achieve the above - interested to have a feature in graphman that extend a copy operation to achieve this if folks think it is useful to a wider audience and if it's a realistic ask:
Unfortunately that will require some DB surgery. If you're going to do a sharded DB, you should be able to create a new shard with the correct locale, copy the data to the new shard using graphman copy, then you'll have to input the new shard config into the new stack graph node config, manually insert deployment details into the primary, and then set the counter in the primary to one higher than your largest sgd number. We really need a graphman command for dumping subgraph data so it can be imported into another disconnected DB...
At first glance, this feature would operate similarly to
graphman copy
but subgraphs could be sent to a foreign cluster.The motivation behind this encompasses both:
Alternatively, an import/export pattern similar to
pg_dump
would also fit those needs.