hyperledger / firefly

Hyperledger FireFly is the first open source Supernode: a complete stack for enterprises to build and scale secure Web3 applications. The FireFly API for digital assets, data flows, and blockchain transactions makes it radically faster to build production-ready apps on popular chains and protocols.
https://hyperledger.github.io/firefly
Apache License 2.0
507 stars 208 forks source link

Failed to download batch from IPFS #978

Open awrichar opened 2 years ago

awrichar commented 2 years ago

During a recent performance run, the job failed to start because the orgs were not registered properly.

Node 1 shows this upload:

{"log":"[2022-08-17T21:44:05.067Z]  INFO IPFS published QmZV1813npCEb8USvUJXHmnLNbtQgjLgZA8TPF83P1pJ68 Size=1415 d=pinned_broadcast ns=default opcache=juVW47Fq p=did:firefly:org/org_1| pid=1 role=batchmgr\n","stream":"stderr","time":"2022-08-17T21:44:05.068084982Z"}
{"log":"[2022-08-17T21:44:05.067Z]  INFO Published batch 'f4b9cba6-e30b-4cf9-a143-e9ac425dc2d1' to shared storage: 'QmZV1813npCEb8USvUJXHmnLNbtQgjLgZA8TPF83P1pJ68' d=pinned_broadcast ns=default opcache=juVW47Fq p=did:firefly:org/org_1| pid=1 role=batchmgr\n","stream":"stderr","time":"2022-08-17T21:44:05.068104641Z"}

Node 0 is repeatedly unable to download:

{"log":"[2022-08-17T21:44:07.861Z] DEBUG ==\u003e GET http://ipfs_0:8080/ipfs/QmZV1813npCEb8USvUJXHmnLNbtQgjLgZA8TPF83P1pJ68 breq=BAGcvc3U pid=1 sharedstorage=ipfs\n","stream":"stderr","time":"2022-08-17T21:44:07.862328371Z"}
{"log":"[2022-08-17T21:44:37.862Z] DEBUG \u003c== GET http://ipfs_0:8080/ipfs/QmZV1813npCEb8USvUJXHmnLNbtQgjLgZA8TPF83P1pJ68 [0] (30001.16ms) breq=BAGcvc3U pid=1 sharedstorage=ipfs\n","stream":"stderr","time":"2022-08-17T21:44:37.862481893Z"}
{"log":"[2022-08-17T21:44:37.862Z] DEBUG ipfs updating operation default:f9ffe35b-28de-446b-a849-177db05d3134 status=Pending error=FF10376: Error downloading data with reference 'QmZV1813npCEb8USvUJXHmnLNbtQgjLgZA8TPF83P1pJ68' from shared storage: FF10136: Error from IPFS: : Get \"http://ipfs_0:8080/ipfs/QmZV1813npCEb8USvUJXHmnLNbtQgjLgZA8TPF83P1pJ68\": context deadline exceeded (Client.Timeout exceeded while awaiting headers) ns=default pid=1\n","stream":"stderr","time":"2022-08-17T21:44:37.862524061Z"}
{"log":"[2022-08-17T21:44:37.862Z] ERROR Download operation sharedstorage_download_batch/f9ffe35b-28de-446b-a849-177db05d3134 attempt=1/100 failed: FF10376: Error downloading data with reference 'QmZV1813npCEb8USvUJXHmnLNbtQgjLgZA8TPF83P1pJ68' from shared storage: FF10136: Error from IPFS: : Get \"http://ipfs_0:8080/ipfs/QmZV1813npCEb8USvUJXHmnLNbtQgjLgZA8TPF83P1pJ68\": context deadline exceeded (Client.Timeout exceeded while awaiting headers) downloadworker=dw_007 ns=default pid=1\n","stream":"stderr","time":"2022-08-17T21:44:37.862698724Z"}
awrichar commented 2 years ago

log_firefly_core_0.log.gz log_firefly_core_1.log.gz

Unfortunately did not capture IPFS logs. However, I'm fairly certain IPFS was up and not logging any obvious anomalies.

awrichar commented 2 years ago

I've also seen this locally at least once, so it wasn't a totally isolated incident.

peterbroadhurst commented 2 years ago

So from the surface of the issue, the IPFS network seems like it's not healthy.

Each time a download request is made against Node 0, it should reach out to its peers to find the data. And Node 1 should have knowledge of that data in its DAG.