Open kallisti5 opened 3 years ago
@kallisti5 could you grab a little more debugging info:
ipfs config show
ipfs dag stat --offline QmSSxDK8br4Z5UHQBXZWuXew8uP2uJ98oYT1HQAw7QrwKT
to make sure it completes okipfs swarm connect /full/address/of/vultr/node
on the client node after the rebootipfs config show
ipfs bitswap stat
should show some of the blocks you are waiting on, if those blocks are part of the QmSS...
graph it could be easier to determine what's going onThis might be a duplicate of #7972, which appears to be fixed in master of go-bitswap (the dependency update isn't yet included in go-ipfs master).
Home node (wants)
ipfs@ipfsgw:~$ ipfs bitswap stat
bitswap status
provides buffer: 0 / 256
blocks received: 117310
blocks sent: 12037
data received: 23934805330
data sent: 2561571207
dup blocks received: 4393
dup data received: 691026392
wantlist [1 keys]
Qme5ubrNa932DxVxMtTkkjpiaKMLhVPfoFszyGfhS5BctD
partners [763]
Vultr node (has):
ipfs bitswap stat
bitswap status
provides buffer: 0 / 256
blocks received: 0
blocks sent: 99009
data received: 0
data sent: 22941823079
dup blocks received: 0
dup data received: 0
wantlist [0 keys]
partners [785]
Vultr Node (has) (putting the wantlist from my home node into dag stat for the vultr node:)
# ipfs dag stat --offline Qme5ubrNa932DxVxMtTkkjpiaKMLhVPfoFszyGfhS5BctD
Size: 0, NumBlocks: 1
Vultr Node (has):
root@ipfs:~# ipfs dag stat --offline QmPZvhACHCf1uXr5nVwvpSJSYjQviShZkoZ4ctx6LQ6q37
Size: 5111615603, NumBlocks: 22289
(still running. I'll update here if it fails)
I pinged your Vultr node and it responds as not having Qme5ubrNa932DxVxMtTkkjpiaKMLhVPfoFszyGfhS5BctD
. If running ipfs block stat --offline Qme5ubrNa932DxVxMtTkkjpiaKMLhVPfoFszyGfhS5BctD
on the Vultr node that would be interesting.
I noticed you updated the CID in your comment that you were trying to pin, since you have garbage collection enabled then it's possible that Qme5ubrNa932DxVxMtTkkjpiaKMLhVPfoFszyGfhS5BctD
has since been cleaned up on the Vultr node and so if the home node is still looking for it it's not going to find it.
I pinged your Vultr node and it responds as not having Qme5ubrNa932DxVxMtTkkjpiaKMLhVPfoFszyGfhS5BctD. If running ipfs block stat --offline Qme5ubrNa932DxVxMtTkkjpiaKMLhVPfoFszyGfhS5BctD on the Vultr node that would be interesting.
Ooh.. I think this might be the root cause. I posted ipfs dag stat --offline Qme5ubrNa932DxVxMtTkkjpiaKMLhVPfoFszyGfhS5BctD
above from the Vultr node showing it did indeed have that block (though oddly empty). (Size: 0, NumBlocks: 1
)
Check this out (vultr node)
root@ipfs:~# ipfs dag get --offline Qme5ubrNa932DxVxMtTkkjpiaKMLhVPfoFszyGfhS5BctD
{"data":null,"links":[]}
Here's the completed diag stat for the whole thing on the vultr node:
ipfs dag stat --offline QmPZvhACHCf1uXr5nVwvpSJSYjQviShZkoZ4ctx6LQ6q37
Size: 22827570312, NumBlocks: 95529
So... locally it thinks it has that block, but remotely it reports not having it?
root@ipfs:~# ipfs dag get --offline Qme5ubrNa932DxVxMtTkkjpiaKMLhVPfoFszyGfhS5BctD {"data":null,"links":[]}
This seems a bit weird to me since I'd expect an empty DagPb node to have the hash QmdfTbBqBPQ7VNxZEYEj14VmRuZBkqFbiwReogJgS1zR1n
DagPb spec. I wonder if there is some datastore corruption (which would cause your node to send back data but my node to not recognize it as having the CID I was expecting).
Could you run ipfs dag export Qme5ubrNa932DxVxMtTkkjpiaKMLhVPfoFszyGfhS5BctD > empty.car
and then ipfs add
the car file and post the CID here so I can take a look? Note: What I'm asking you to do is basically export the DAG underneath that CID (which is just a single node in this case) to a file so we can investigate it even if it has some corruption that is preventing it from being sent over the network.
It may take a while but it'd be helpful to run ipfs repo verify
to check on repo corruption issues.
# ipfs dag export Qme5ubrNa932DxVxMtTkkjpiaKMLhVPfoFszyGfhS5BctD > empty.car
0s 92 B / ? [-------------------------------------------------------------------------------------------------------------------=-------------------------------------------------------------------------------------------] 3.22 KiB/s 0s
# ipfs add empty.car
added QmdWr79vNxtjzVdQBvK4BVH4K94jD9g54t2BzRAuMzUjVc empty.car
Running a ipfs repo verify
now
❯ ipfs dag import .\QmdWr79vNxtjzVdQBvK4BVH4K94jD9g54t2BzRAuMzUjVc
Pinned root Qme5ubrNa932DxVxMtTkkjpiaKMLhVPfoFszyGfhS5BctD FAILED: blockstore: block not found
Error: unable to pin all roots: 1 out of 1 failed
Ya, something weird is going on. I'm guessing a corrupted block. The CAR file contains a single block which is the empty block with the CID QmdfTbBqBPQ7VNxZEYEj14VmRuZBkqFbiwReogJgS1zR1n
.
Yup.
# ipfs repo verify
block QmTRQhKWeLJUDvLqxDZmBkLwJSSy7B3Er9ytAhQcDiG1Lf was corrupt (block in storage has different hash than requested)
block QmZ7rUYoMm4cpnxCHS9KBJeuUPmcyZsg4Qgi3EAjUq3iTM was corrupt (block in storage has different hash than requested)
block Qme5ubrNa932DxVxMtTkkjpiaKMLhVPfoFszyGfhS5BctD was corrupt (block in storage has different hash than requested)
How do you even fix that when it's MFS? (is there a way to identify which files and re-upload them? I don't really understand how they got corrupted.. the node is brand new and I just added the files without any real issues.
To wrap this one up. I ended up just deleting the repo and re-adding the data under flatfile instead of badger. Things are smooth now.
Since this is the "seed" node for MFS data, repo stability is important. It sounds like badger is better for user nodes, while flat is better for "critical data sources"
However... this case might be useful for some enhancement? I had CID's that nothing saw as invalid (and the MFS filesystem was fully available) until I did a full repo validation. This seems like a potentially common issue.
I have had random stuck events.
But retry after the interruption passed.
I am using go-http-api-client
I was stuck for a whole day while pinning a cid.
Is this the same problem? @kallisti5
@godcong
I have had random stuck events. But retry after the interruption passed.
There's a mix of issues here:
These are the "biggest" issues we encountered trying to use IPFS at scale. They represent some really frustrating user experiences.
Version information:
I have 22GiB+ on MFS data on a node at Vultr. Latest MFS CID is pointed to via /ipns/hpkg.haiku-os.org
Description:
ipfs pin add --progress /ipns/hpkg.haiku-os.org
ipfs pin add --progress /ipfs/QmPZvhACHCf1uXr5nVwvpSJSYjQviShZkoZ4ctx6LQ6q37
Fetched/Processed 95036 nodes
for like 2 days.Fetched/Processed 95036 nodes
" for ~12 hours now.sysstat from home node(while stuck pinning)
Debug from home node (while stuck pinning)
homenode-debug.tar.gz