ipfs / go-ds-s3

An s3 datastore implementation
MIT License
239 stars 66 forks source link

500 server error when using HTTP API cat and stat to node using s3 datastore #217

Open shc261392 opened 2 years ago

shc261392 commented 2 years ago

Hi, I've met an issue using IPFS node built with the go-ds-s3 plugin. For now I can't be very sure about what the root cause is, but I'll provided as many information as possible, would be very appreciated if anyone could help.

Problem

When using the built IPFS node's HTTP API api/v0/cat and api/v0/stat, sometimes a 500 error (Internal Server Error) will be returned. So far we have tried but can't find a stable way to reproduce this issue. From what we've observed, calling the exactly same API endpoint will same headers and query parameters does not always meet this problem.

The same problem never happens on another IPFS node using the default datastore without this plugin, so we think the might be an issue related to the plugin.

Making API calls like api/v0/add or api/v0/get seems always works however.

EDIT: the problem also happens for api/v0/add

Logs that might be related to the problem

We have turned the IPFS daemon's debug flag --debug on, but it produces very verbose logs. Since I'm not so familiar with how to debug IPFS daemon, I can only list some error messages that I think might be related the problem. I can provide more detailed logs or look for some certain debugging messages if I know where to look at, please let me know if there's something else that might help.

Logs that we think might be related to the problem:

2022-03-10T10:39:43.034Z    ERROR   engine  decision/blockstoremanager.go:92    blockstore.GetSize(bafybeied5upazh2w4hwhmaoduhd7c2n37q2mil4afzp3ycyycxhjszo2oe) error: RequestError: send request failed
 caused by: Head "https://ipfs-s3-datastore.s3-accelerate.amazonaws.com/ipfs-s3-datastore/blocks/CIQIH3I6BSPVNYPMOYA4HIOH6FU3X7BUYQXYALS7XQFRQFOOTFS5U4I": dial tcp 99.84.182.227:443: connect: cannot       assign requested address
2022-03-10T10:39:43.036Z    ERROR   engine  decision/blockstoremanager.go:92    blockstore.GetSize(QmTVebXpZShgdBRtP7j8VnvX3HGbjjuUZotssXs39HnAyj) error: RequestError: send request failed
 caused by: Head "https://ipfs-s3-datastore.s3-accelerate.amazonaws.com/ipfs-s3-datastore/blocks/CIQEZGEIUOETNBK34TFYDMESR7IUMNH6Z2J6WSBE4GYJ4W7N26T6LJQ": dial tcp 99.84.182.227:443: connect: cannot       assign requested address
2022-03-10T10:39:43.350Z    DEBUG   dht go-libp2p-kad-dht@v0.15.0/query.go:513  error connecting: failed to dial 12D3KooWPwYnvy9zdTnBj31zAVrrdvLWNmBW91vnUhf3Ku2GCPhf:
   * [/ip6/2a02:908:2211:d100:ccbc:611c:3deb:a84d/tcp/54806] dial tcp6 [2a02:908:2211:d100:ccbc:611c:3deb:a84d]:54806: connect: cannot assign requested address
   * [/ip6/2a02:908:2211:d100:ccbc:611c:3deb:a84d/tcp/4001] dial tcp6 [2a02:908:2211:d100:ccbc:611c:3deb:a84d]:4001: connect: cannot assign requested address
   * [/ip6/::1/tcp/4001] dial tcp6 [::1]:4001: connect: cannot assign requested address
   * [/ip6/2a02:908:2211:d100:4b7:766b:dfe8:9037/tcp/4001] dial tcp6 [2a02:908:2211:d100:4b7:766b:dfe8:9037]:4001: connect: cannot assign requested address
   * [/ip6/::1/udp/4001/quic] INTERNAL_ERROR: write udp6 [::]:4001->[::1]:4001: sendto: cannot assign requested address
   * [/ip6/2a02:908:2211:d100:4b7:766b:dfe8:9037/udp/4001/quic] INTERNAL_ERROR: write udp6 [::]:4001->[2a02:908:2211:d100:4b7:766b:dfe8:9037]:4001: sendto: cannot assign requested address
   * [/ip6/2a02:908:2211:d100:ccbc:611c:3deb:a84d/udp/4001/quic] INTERNAL_ERROR: write udp6 [::]:4001->[2a02:908:2211:d100:ccbc:611c:3deb:a84d]:4001: sendto: cannot assign requested address
   * [/ip4/192.168.0.59/udp/4001/quic] context deadline exceeded
   * [/ip4/192.168.0.59/tcp/4001] dial tcp4 0.0.0.0:4001->192.168.0.59:4001: i/o timeout
   * [/ip4/37.201.146.171/tcp/54806] dial tcp4 0.0.0.0:4001->37.201.146.171:54806: i/o timeout
   * [/ip4/37.201.146.171/udp/45813/quic] timeout: no recent network activity
2022-03-10T10:39:43.701Z    DEBUG   dht net/message_manager.go:82   request failed to open message sender   {"error": "failed to dial 12D3KooWB3L8vLPpG6XHwPw3aFjpTYrKCNTniMw97sx8FU5MHqtM:\n  * [/ip6/::1/ tcp/4001] dial backoff\n  * [/ip6/2002:b6a2:68fa::b6a2:68fa/tcp/4001] dial backoff\n  * [/ip6/::1/udp/4001/quic] INTERNAL_ERROR: write udp6 [::]:4001->[::1]:4001: sendto: cannot assign requested          address\n  * [/ip6/2002:b6a2:68fa::b6a2:68fa/udp/4001/quic] INTERNAL_ERROR: write udp6 [::]:4001->[2002:b6a2:68fa::b6a2:68fa]:4001: sendto: cannot assign requested address\n  * [/ip4/182.162.104.250/udp/ 4001/quic] timeout: no recent network activity\n  * [/ip4/182.162.104.250/tcp/4001] dial tcp4 0.0.0.0:4001->182.162.104.250:4001: i/o timeout", "to":                                                       "12D3KooWB3L8vLPpG6XHwPw3aFjpTYrKCNTniMw97sx8FU5MHqtM"}
2022-03-10T10:39:45.818Z    DEBUG   dht go-libp2p-kad-dht@v0.15.0/query.go:513  error connecting: failed to dial 12D3KooWBCHsSNQhhBwp96vxYJX6HdYAtyFwAXiE2b4NoAx9Y43e:
   * [/ip6/::1/tcp/4001] dial tcp6 [::1]:4001: connect: cannot assign requested address
   * [/ip6/::1/udp/4001/quic] INTERNAL_ERROR: write udp6 [::]:4001->[::1]:4001: sendto: cannot assign requested address
   * [/ip4/61.4.101.79/udp/4001/quic] timeout: no recent network activity
   * [/ip4/61.4.101.79/tcp/4001] failed to negotiate security protocol: read tcp4 172.20.0.2:4001->61.4.101.79:4001: read: connection reset by peer

Environment

Environment Version
OS Ubuntu 20.04
Golang 1.17.6
go-ipfs 0.11.0
go-ds-s3 1ad440b
S3 Region us-east-1
S3 Region Endpoint S3 Acceleration Endpoint

The IPFS daemon is built from source with the plugin and run in a Docker container, Dockerfile link here.

Only one IPFS daemon is running, and the S3-bucket only serves the daemon, not being used by anything else.

welcome[bot] commented 2 years ago

Thank you for submitting your first issue to this repository! A maintainer will be here shortly to triage and review. In the meantime, please double-check that you have provided all the necessary information to make this process easy! Any information that can help save additional round trips is useful! We currently aim to give initial feedback within two business days. If this does not happen, feel free to leave a comment. Please keep an eye on how this issue will be labeled, as labels give an overview of priorities, assignments and additional actions requested by the maintainers:

Finally, remember to use https://discuss.ipfs.io if you just need general support.

aeiou335 commented 2 years ago

api/v0/add also will return 500 internal error sometimes.

guseggert commented 2 years ago

Is there a body to the response? If so it might have an error message in it. Also the error is logged if you turn on debug logging for the "cmds" system ipfs log level cmds debug.

shc261392 commented 2 years ago

There is only Internal Server Error message, no additional info.

After setting the cmds subsystem log to debug level, here's what I've found that could be relevant:

2022-03-18T04:24:18.494Z    DEBUG   cmds    go-ipfs-cmds@v0.6.0/command.go:145  error occured in call, closing with error: failed to get block for bafybeiaz5zrdvjtoqmibprkk6g7oxbdo2zmdxuu3j5agbaaczph7ylckjq: RequestError: send request failed
caused by: Get "https://ipfs-s3-datastore.s3-accelerate.amazonaws.com/ipfs-s3-datastore/blocks/AFYBEIAZ5ZRDVJTOQMIBPRKK6G7OXBDO2ZMDXUU3J5AGBAACZPH7YLCKJQ": dial tcp 13.32.196.208:443: connect: cannot assign requested address
IAWYXOAXCSAW5P2MA": dial tcp 13.32.196.208:443: connect: cannot assign requested address
guseggert commented 2 years ago

Awesome thanks. This smells like ephemeral port exhaustion, could you check the number of TCP conns in TIME-WAIT state? ss -at | tail -n +2 | awk '{print $1}' | sort | uniq -c | sort -n

Although the fact that this error is also happening on both TCP and UDP swarm sockets suggests there may be something deeper wrong with your host, like misconfigured networking.

shc261392 commented 2 years ago

Here's what I get for running the above command:

$ ss -at | tail -n +2 | awk '{print $1}' | sort | uniq -c | sort -n
     16 LISTEN
    454 ESTAB
    503 TIME-WAIT

The node is running on an EC2 instance in the same region as the S3 bucket, the only network-related config I've added is allowing HTTP and HTTPS inbound traffic in security group which is probably not related to the issue. I'm not sure what should I check next.