ipfs / ipget

Retrieve files over IPFS and save them locally.
MIT License
388 stars 54 forks source link

Cannot complete download #103

Open vmx opened 2 years ago

vmx commented 2 years ago

I'm using ipget on some remote node in a data center. There the download of a large file halts in the middle of the file at one point. I'm using ipget v.0.7.0 (that's the latest on dist.ipfs.io). I've no idea how to debug this. I guess it's the environment, but it's hard to tell. The command I've used is:

LIBP2P_TCP_REUSEPORT=false /var/tmp/ipget-v0.6.0/ipget/ipget QmNPc75iEfcahCwNKdqnWLtxnjspUGGR4iscjiz3wP3RtS -o /var/tmp/filecoin-proof-parameters/v28-empty-sector-update-merkletree-poseidon_hasher-8-8-0-3b7f44a9362e3985369454947bc94022e118211e49fd672d52bec1cbfd599d18.params --progress --node=temp --peers /dns4/collab-cluster-am6-2.cluster.dwebops.net/tcp/4001/p2p/12D3KooWCrBiagtZMzpZePCr1tfBbrZTh4BRQf7JurRqNMRi8YHF --peers /dns4/collab-cluster-am6-3.cluster.dwebops.net/tcp/4001/p2p/12D3KooWDpp7U7W9Q8feMZPPEpPP5FKXTUakLgnVLbavfjb9mzrT --peers /dns4/collab-cluster-ams1-1.cluster.dwebops.net/tcp/4001/p2p/QmNMs4C2taBgMP716bgaN6wyyLRMTzLSVC5aqXrNjHE33Z --peers /dns4/collab-cluster-dc13-1.cluster.dwebops.net/tcp/4001/p2p/12D3KooWHVXoJnv2ifmr9K6LWwJPXxkfvzZRHzjiTZMvybeTnwPy --peers /dns4/collab-cluster-dc13-2.cluster.dwebops.net/tcp/4001/p2p/12D3KooWEDBLgMaCr6ZFwjDXr7eMXzb7s7SnHJHrYRYWYbQSxMif --peers /dns4/collab-cluster-sjc1-1.cluster.dwebops.net/tcp/4001/p2p/Qmde7irdYqkbhfFsu6xKzBgmGWJPnx8bS7TNVdAko4gswW --peers /dns4/collab-cluster-sjc1-2.cluster.dwebops.net/tcp/4001/p2p/12D3KooWKZLdYX8fEqMu5jNKpSKzyXjjNYosJGj5T9uDXKxseAsw

This is blocking me on some things, as I cannot get the Filecoin proof parameters to that machine easily.

aschmahmann commented 2 years ago

My suspicion is that if the connection gets dropped for some reason and that data isn't properly advertised in the DHT then you'll never re-establish the connection because --peers only connects to the peers once up front.

This could be resolved by making https://github.com/ipfs/ipget/blob/5397b0666d7e90d78c1566ecb90f289dad9d9ec1/util.go#L13 repeatedly connect to the target peers. Code can be copied from https://github.com/ipfs/go-ipfs/blob/d6de97b417def4feaf1382d0ff423e22fd2ff08b/peering/peering.go as inspiration.

shawnrader commented 1 year ago

I attempted to fix this in this branch https://github.com/shawnrader/ipget/tree/reconnect, however it appears that Swarm is maintaining connections to hundreds of hosts according to Swarm.Peers(), so attempting to re-establish the connection does not address the issue. My guess at this point is either 1. the hosts serving the filecoin param files are not doing so reliably and/or 2. there is an issue deeper in the IPFS causing the download to get stuck. I think we need to make sure #1 is addressed before investigating further. When the issue occurs I see the download progress bar go down to 0 bytes/sec and stay there.