IPNS GetPublished should fail with unexpected error

vasco-santos commented 5 years ago

Version information: 0.4.18

Type: Bug

Description:

When a new record is published, we try to get the its last published version, in order to update the sequence number. This way, taking into account the GetPublished funtion, if the record is not found in the local datastore, we will use the routing to find it. If we do not find the record, all good, we will create a new one. However, if we receive any other error the publish should fail. Otherwise, we may restart the sequence number of a previously published record.

cc @alanshaw @Stebalien

Stebalien commented 5 years ago

Otherwise, we may restart the sequence number of a previously published record.

So, the reasoning here was that we have this problem regardless because we may simply not find the record (partitioned network). At the end of the day, we don't really support multiple nodes publishing with the same key. That's why we only check the DHT if we don't have a local value.

We discussed this the last time we refactored the routing system but the conclusion was that supporting multiple publishers on the same key is going to be race/error-prone regardless of what we do.

So, really, this is primarily useful when migrating from one node to another although it's error prone in that case as well. We may just want to consider removing this entirely(?).

I suggested that you open an issue so we could consider all the possible error cases here. I'm pretty sure the only errors we can get are:

Not Found
Timeout/cancel
Fatal network error, publishing is going to fail anyways.

Basically, I'm pretty sure there isn't a bug here but we should nail down the guarantees/semantics.

vasco-santos commented 5 years ago

We discussed this the last time we refactored the routing system but the conclusion was that supporting multiple publishers on the same key is going to be race/error-prone regardless of what we do.

Agree that supporting multiple nodes using the same key is error-prone

So, really, this is primarily useful when migrating from one node to another although it's error prone in that case as well. We may just want to consider removing this entirely(?).

Considering that we assume that we do not support sharing keys between nodes, which I totally agree unless we implement a safe way of doing it, I think we should just remove the code. At least in JS land, this code will only run when we are initializing the node (other times, in the following publish, it will be assuredly in the datastore). This way, when we are initializing the repo we do not have the DHT running. So, it will only test the local repo and unless a unexpected behavior happens, this get seems unecessary.

My proposal would be to remove the routing.get in JS side and GO side and I add to the spec that we do not support multiple publishers. Then in the future, if we intend to support this, we can update the spec without breaking anything and support it properly. What do you think?

Stebalien commented 5 years ago

What do you think?

I agree but I think we need some way to transition publishing from one node to another. The current way kind of works usually so I'd like to avoid simply removing it till we have a replacement.

alanshaw commented 5 years ago

The current way kind of works usually so I'd like to avoid simply removing it till we have a replacement.

I agree. I think this part of GetPublished just needs to be tweaked to throw (return/callback/whatever) the error but the calling code should be able to decide what to do with that. So if we're putting a new record then we're able to catch, log and ignore it and when getting a record we'll have a better idea of why the record could not be retrieved (404/network/other).

Stebalien commented 5 years ago

It looks like someone ran into this (well, a related) issue here: https://discuss.ipfs.io/t/name-publish-error-cant-replace-a-newer-value-with-an-older-value/3823

I agree. I think this part of GetPublished just needs to be tweaked to throw (return/callback/whatever) the error but the calling code should be able to decide what to do with that. So if we're putting a new record then we're able to catch, log and ignore it and when getting a record we'll have a better idea of why the record could not be retrieved (404/network/other).

That's not the function we use to get records from the network. We just use it to get the last version of a record that we published.

alanshaw commented 5 years ago

That's not the function we use to get records from the network. We just use it to get the last version of a record that we published.

Ah ok, no worries then. I was being asked to look at this as a parallel to what's being proposed in JS so assumed it was being used for get as well as put.

vasco-santos commented 5 years ago

Yes, this is only for getting the last version of a record during a publish.

Accordingly, If we throw the error, we stop the publish. Otherwise, the publish process can continue.

alanshaw commented 5 years ago

🙄 bah, yes you're right ignore me.

anacrolix commented 5 years ago

I'm getting the error Error: can't replace a newer value with an older value. Is it certain that this issue is related to the error produced at the line @rklaehn provided (https://github.com/libp2p/go-libp2p-kad-dht/blob/627b5f96f0d3e42cc9ce73570501be83efbeedf9/routing.go#L68)?

In my case I was trying to republish a key with a longer lifetime.

Stebalien commented 5 years ago

That's the only place that error can be returned.

Given the discussion in https://discuss.ipfs.io/t/name-publish-error-cant-replace-a-newer-value-with-an-older-value/3823, there may be a deeper issue here. Unfortunately, I can't seem to find it.

tamone commented 5 years ago

"We discussed this the last time we refactored the routing system but the conclusion was that supporting multiple publishers on the same key is going to be race/error-prone regardless of what we do."

This is an important matter. IPFS is all about distributing content. One point is to avoid a single point of failure. To reach content one needs to use a consistent addressing, just like ipns publish is delivering, so that the dapp has not to be rewritten each time the contents is changed. IPNS name republish is necessary every 24 hours and should not be dependent of one and only ipfs node, that could come down. So the solution is to publish from several nodes with the same secondary key. In my opinion if this cannot be done on several nodes, a big part of IPFS advantages is lost. What do you think ?

Stebalien commented 5 years ago

PNS name republish is necessary every 24 hours and should not be dependent of one and only ipfs node, that could come down.

The issue being discussed here doesn't affect re-publishing, only initial publishing. Re-publishing won't increment the sequence number so it isn't an issue in theory. In practice, we just haven't implemented a "republish-only" feature yet.

ipfs / kubo