Regression: Adding a lot of files to MFS will slow ipfs down significantly

RubenKelevra commented 2 years ago

Checklist

[X] This is a bug report, not a question. Ask questions on discuss.ipfs.io.
[X] I have searched on the issue tracker for my bug.
[X] I am running the latest go-ipfs version or have an issue updating.

Installation method

built from source

Version

go-ipfs version: 0.13.0-dev-2a871ef01
Repo version: 12
System version: amd64/linux
Golang version: go1.17.6

Config

```json { "API": { "HTTPHeaders": {} }, "Addresses": { "API": "/ip4/127.0.0.1/tcp/5001", "Announce": [], "AppendAnnounce": null, "Gateway": "/ip4/127.0.0.1/tcp/80", "NoAnnounce": [ "/ip4/10.0.0.0/ipcidr/8", "/ip4/100.64.0.0/ipcidr/10", "/ip4/169.254.0.0/ipcidr/16", "/ip4/172.16.0.0/ipcidr/12", "/ip4/192.0.0.0/ipcidr/24", "/ip4/192.0.0.0/ipcidr/29", "/ip4/192.0.0.8/ipcidr/32", "/ip4/192.0.0.170/ipcidr/32", "/ip4/192.0.0.171/ipcidr/32", "/ip4/192.0.2.0/ipcidr/24", "/ip4/192.168.0.0/ipcidr/16", "/ip4/198.18.0.0/ipcidr/15", "/ip4/198.51.100.0/ipcidr/24", "/ip4/203.0.113.0/ipcidr/24", "/ip4/240.0.0.0/ipcidr/4", "/ip6/100::/ipcidr/64", "/ip6/2001:2::/ipcidr/48", "/ip6/2001:db8::/ipcidr/32", "/ip6/fc00::/ipcidr/7", "/ip6/fe80::/ipcidr/10" ], "Swarm": [ "/ip4/0.0.0.0/tcp/443", "/ip6/::/tcp/443", "/ip4/0.0.0.0/udp/443/quic", "/ip6/::/udp/443/quic" ] }, "AutoNAT": {}, "Bootstrap": [ "/dnsaddr/bootstrap.libp2p.io/p2p/QmNnooDu7bfjPFoTZYxMNLWUQJyrVwtbZg5gBMjTezGAJN", "/dnsaddr/bootstrap.libp2p.io/p2p/QmQCU2EcMqAqQPR2i9bChDtGNJchTbq5TbXJJ16u19uLTa", "/dnsaddr/bootstrap.libp2p.io/p2p/QmbLHAnMoJPWSCR5Zhtx6BHJX9KiKNN6tpvbUcqanj75Nb", "/dnsaddr/bootstrap.libp2p.io/p2p/QmcZf59bWwK5XFi76CZX8cbJ4BhTzzA3gU1ZjYZcYW3dwt", "/ip4/104.131.131.82/tcp/4001/p2p/QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ", "/ip4/104.131.131.82/udp/4001/quic/p2p/QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ" ], "DNS": { "Resolvers": null }, "Datastore": { "BloomFilterSize": 0, "GCPeriod": "1h", "HashOnRead": false, "Spec": { "mounts": [ { "child": { "path": "blocks", "shardFunc": "/repo/flatfs/shard/v1/next-to-last/2", "sync": false, "type": "flatfs" }, "mountpoint": "/blocks", "prefix": "flatfs.datastore", "type": "measure" }, { "child": { "compression": "none", "path": "datastore", "type": "levelds" }, "mountpoint": "/", "prefix": "leveldb.datastore", "type": "measure" } ], "type": "mount" }, "StorageGCWatermark": 90, "StorageMax": "500GB" }, "Discovery": { "MDNS": { "Enabled": false, "Interval": 10 } }, "Experimental": { "AcceleratedDHTClient": false, "FilestoreEnabled": false, "GraphsyncEnabled": false, "Libp2pStreamMounting": false, "P2pHttpProxy": false, "StrategicProviding": false, "UrlstoreEnabled": false }, "Gateway": { "APICommands": [], "HTTPHeaders": { "Access-Control-Allow-Headers": [ "X-Requested-With", "Range", "User-Agent" ], "Access-Control-Allow-Methods": [ "GET" ], "Access-Control-Allow-Origin": [ "*" ] }, "NoDNSLink": false, "NoFetch": false, "PathPrefixes": [], "PublicGateways": null, "RootRedirect": "", "Writable": false }, "Identity": { "PeerID": "xxx" }, "Internal": {}, "Ipns": { "RecordLifetime": "96h", "RepublishPeriod": "", "ResolveCacheSize": 2048 }, "Migration": { "DownloadSources": null, "Keep": "" }, "Mounts": { "FuseAllowOther": false, "IPFS": "/ipfs", "IPNS": "/ipns" }, "Peering": { "Peers": null }, "Pinning": {}, "Plugins": { "Plugins": null }, "Provider": { "Strategy": "" }, "Pubsub": { "DisableSigning": false, "Router": "gossipsub" }, "Reprovider": { "Interval": "12h", "Strategy": "all" }, "Routing": { "Type": "dhtserver" }, "Swarm": { "AddrFilters": [ "/ip4/10.0.0.0/ipcidr/8", "/ip4/100.64.0.0/ipcidr/10", "/ip4/169.254.0.0/ipcidr/16", "/ip4/172.16.0.0/ipcidr/12", "/ip4/192.0.0.0/ipcidr/24", "/ip4/192.0.0.0/ipcidr/29", "/ip4/192.0.0.8/ipcidr/32", "/ip4/192.0.0.170/ipcidr/32", "/ip4/192.0.0.171/ipcidr/32", "/ip4/192.0.2.0/ipcidr/24", "/ip4/192.168.0.0/ipcidr/16", "/ip4/198.18.0.0/ipcidr/15", "/ip4/198.51.100.0/ipcidr/24", "/ip4/203.0.113.0/ipcidr/24", "/ip4/240.0.0.0/ipcidr/4", "/ip6/100::/ipcidr/64", "/ip6/2001:2::/ipcidr/48", "/ip6/2001:db8::/ipcidr/32", "/ip6/fc00::/ipcidr/7", "/ip6/fe80::/ipcidr/10" ], "ConnMgr": { "GracePeriod": "3m", "HighWater": 700, "LowWater": 500, "Type": "basic" }, "DisableBandwidthMetrics": false, "DisableNatPortMap": true, "RelayClient": {}, "RelayService": {}, "Transports": { "Multiplexers": {}, "Network": { "QUIC": false }, "Security": {} } } } ```

Description

I'm running 2a871ef01 compiled by go 1.17.6 on Arch Linux for some days on one of my servers.

I had trouble with my MFS datastore after updating (I couldn't delete a file). So I reset my datastore and started importing the data again.

I'm using a shell script that adds the files and folders individually. Because of #7532, I can't use ipfs files write but instead use ipfs add, followed by an ipfs files cp /ipfs/$cid /path/to/file and an ipfs pin rm $cid.

For the ipfs add is set size-65536 as the chunker, blake2b-256 as the hashing algorithm, and use raw-leaves.

After the 3 days, there was basically no IO on the machine and ipfs was using around 1.6 cores pretty consistently without any progress real progress. At that time only this one script was running against the API with no concurrency. The automatic garbage collector of ipfs is off.

There are no experimental settings activated and I'm using flatfs.

I did some debugging, all operations were still working, just extremely slow:

$ time /usr/sbin/ipfs --api=/ip4/127.0.0.1/tcp/5001 files stat --hash --offline /x86-64.archlinux.pkg.pacman.store/community
bafybeianfwoujqfauris6eci6nclgng72jttdp5xtyeygmkivzyss4xhum

real    0m59.164s
user    0m0.299s
sys 0m0.042s

and

$ time /usr/sbin/ipfs --api=/ip4/127.0.0.1/tcp/5001 files stat --hash --offline --with-local /x86-64.archlinux.pkg.pacman.store/community
bafybeie5kkzcg6ftmppbuauy3tgtx2f4gyp7nhfdfsveca7loopufbijxu
Local: 20 GB of 20 GB (100.00%)

real    4m55.298s
user    0m0.378s
sys 0m0.031s

This is while my script was still running on the API and waiting minutes on each response.

Here's my memory dump etc. while the issue occurred: /ipfs/QmPJ1ec2CywWLFeaHFaTeo6g56S5Bqi3g3MEF1a3JrL8zk

Here's a dump after I stopped the import of files and the CPU usage dropped down to like 0.3 cores: /ipfs/QmbotJhgzc2SBxuvGA9dsCFLbxd836QBNFYkLhdqTCZwrP

Here's what the memory looked like as the issue occurred (according to atop 1):

MEM
tot    31.4G
free    6.6G
cache   1.1G
dirty   0.1M
buff   48.9M
slab    7.1G
slrec   3.7G
shmem   2.0M
shrss   0.0M
vmbal   0.0M
zfarc  15.6G
hptot   0.0M
hpuse   0.0M

The machine got 10 dedicated cores from a AMD EPYC 7702 and 1 TB SSD storage via NAS.

RubenKelevra commented 2 years ago

The shellscript I'm using is open source, so you should be able to reproduce this:

git clone https://github.com/RubenKelevra/rsync2ipfs-cluster.git rsync2ipfs-cluster
cd rsync2ipfs-cluster
git reset --hard 1fd9712371f0315a35a80e9680340655ba751d7a
bash bin/rsync2cluster.sh --create --arch-config

This will rsync the arch package mirror and loop over the files and import this into the local MFS.

Just make sure you have enough space in ~ for the download (69GB) and on the IPFS node to write this into the storage.

aschmahmann commented 2 years ago

@RubenKelevra do you know which version caused a regression? Have you tried with v0.11.0? v0.12.0 is a very targeted release which should not have disturbed much so understanding when this issue emerged would be very helpful.

RubenKelevra commented 2 years ago

Hey @aschmahmann, I started the import on 0.11 yesterday. As soon as I'm home I can report if this is happening there too.

While an offline import works without slowdown, I still get sometimes errors back which looks like the ipfs add comes too fast back and the ipfs files cp * command can't yet access the CID.

This seems to be a dedicated issue which is probably not a regression, as I never tried importing it off line before.

RubenKelevra commented 2 years ago

I can confirm this issue for 0.11 as well, so it's not a new thing.

$ ipfs version --all
go-ipfs version: 0.11.0-67220edaa
Repo version: 11
System version: amd64/linux
Golang version: go1.17.6

The next step for me is to try the binary from dist.ipfs.io to rule out any build issues.

RubenKelevra commented 2 years ago

The next step for me is to try the binary from dist.ipfs.io to rule out any build issues.

I can confirm the issue for the binary from dist.ipfs.io as well.

aschmahmann commented 2 years ago

I can confirm this issue for 0.11 as well, so it's not a new thing.

Thanks that's very helpful. Is this a v0.10.0 -> v0.11.0 thing? When was the last known version before the behavior started changing? In any event, having a more minimal reproduction would help (e.g. making a version of the script that works from a local folder rather than relying on rsync).

If this is v0.11.0 related then my suspicion is that you have directories that were small enough you could transfer them through go-ipfs previously, but large enough that MFS will now automatically shard them (could be confirmed by looking at your MFS via ipfs dag get and seeing if you have links like FF in your directories). IIRC I saw some HAMT checks in your profile dump which would support this.

If so then what exactly about sharded directories + MFS is causing the slow down should be looked at. Some things I'd start with investigating are:

The modifications of the sharded directories are more expensive for repeated MFS updates
- Since you have to modify multiple blocks at a time
- The limit checks for automatic sharding/unsharding are too expensive for repeated MFS modifications
- Bulking up writes and flushing would likely help here, although if going down this road I'd be careful. My suspicion is that MFS flush has not been extensively tested and probably even more so with sharded directories

RubenKelevra commented 2 years ago

Thanks that's very helpful. Is this a v0.10.0 -> v0.11.0 thing? When was the last known version before the behavior started changing?

I think the last time I ran a full import I was on 0.9.1.

I just started the import to make sure that's correct.

In any event, having a more minimal reproduction would help (e.g. making a version of the script that works from a local folder rather than relying on rsync).

Sure, if you want to avoid any rsync, just comment out L87. I think that should work.

The script will still expect a repository directory like from Manjaro or Arch to work properly, but you can just reuse the same repository without having to update it between each try.

If so then what exactly about sharded directories + MFS is causing the slow down should be looked at. Some things I'd start with investigating are:

The modifications of the sharded directories are more expensive for repeated MFS updates

Since you have to modify multiple blocks at a time

The limit checks for automatic sharding/unsharding are too expensive for repeated MFS modifications

Bulking up writes and flushing would likely help here, although if going down this road I'd be careful. My suspicion is that MFS flush has not been extensively tested and probably even more so with sharded directories

Sounds like a reasonable suspicion, but on the other hand, this shouldn't lead to minutes in response time for simple operations.

I feel like we're dealing with some kind of locked operation which gets "overwritten" with new data fed into ipfs, while it's running. So we pile up tasks before a lock.

This would explain why it's starting fast and get slower and slower until it's basically down to a crawl.

RubenKelevra commented 2 years ago

Ah and additionally, I used sharding previously just for testing, but decided against it. So the import was running fine with sharding previously (like with 0.4 or something).

Previously, there was no need for sharding, which makes me wonder why IPFS would do sharding if it's not necessary.

RubenKelevra commented 2 years ago

@aschmahmann I've installed 0.9.1 from dist.ipfs.io and I can confirm, the bug is not present in this version.

aschmahmann commented 2 years ago

Ok, so to clarify your performance/testing looks like:

v0.12.0-rc1 ❌
v0.11.0 ❌ (includes automatic UnixFS sharding)
v0.10.0 ❓ (includes a bunch of code moving to the newer IPLD libraries)
v0.9.1 ✔️

Previously there was no need for sharding, which makes me wonder why IPFS would do sharding if it's not necessary.

TLDR: Two reasons. 1) Serializing the block to check if it exceeds the limit before re-encoding it is expensive, so having some conservative estimate is reasonable 2) Maxing out the block size isn't necessarily optimal. For example, if you keep writing blocks up to 1MB in size then every time you add an entry you create a duplicate block of similar size which can lead to a whole bunch of wasted space that you may/may not want to GC depending on how accessible you want your history to be. https://github.com/ipfs/go-ipfs/issues/7022#issuecomment-832178501

Thanks for your testing work so far. If you're able to keep going here, understanding if v0.10.0 is ✔️ or ❌ would be helpful. Additionally/alternatively, you could try v0.11.0 and jack up the internal variable controlling the auto-sharding threshold to effectively turn it off by doing ipfs config --json Internal.UnixFSShardingSizeThreshold "\"1GB\"" (1GB is obviously huge and will create blocks too big to transfer, but will make it easy to identify if this is what's causing the performance issue).

I also realized this internal flag was missing from the docs 🤦 so I put up https://github.com/ipfs/go-ipfs/pull/8723

BigLep commented 2 years ago

We're going to close because don't have additional info to dig in further. Feel free to reopen with the requested info if this is still an issue. Thanks.

RubenKelevra commented 1 year ago

@aschmahmann was this fixed? I updated to the 0.13rc1, and I ran into serious performance issues again.

Have you tried to add many files to the MFS in simple ipfs add/ipfs files cp/ipfs pin remove or tried my script, yet?

RubenKelevra commented 1 year ago

@aschmahmann I set ipfs config --json Internal.UnixFSShardingSizeThreshold "\"1MB\"" so 1MB not 1GB, since this should work in theory.

But I still see 30 second delays for removing a single file in the MFS.

RubenKelevra commented 1 year ago

IPFS crashed a couple of times on the server in question, with messages like this:

20:32:49 ipfs[605040]: panic: runtime error: index out of range [57] with length 57
20:32:49 ipfs[605040]: goroutine 311760245 [running]:
20:32:49 ipfs[605040]: github.com/ipfs/go-unixfs/hamt.(*Shard).walkChildren(0xc00f181e30, 0xc03e235280)
20:32:49 ipfs[605040]:         github.com/ipfs/go-unixfs@v0.3.1/hamt/hamt.go:399 +0x3d6
20:32:49 ipfs[605040]: github.com/ipfs/go-unixfs/hamt.parallelShardWalk.func1()
20:32:49 ipfs[605040]:         github.com/ipfs/go-unixfs@v0.3.1/hamt/hamt.go:467 +0x25a
20:32:49 ipfs[605040]: golang.org/x/sync/errgroup.(*Group).Go.func1()
20:32:49 ipfs[605040]:         golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57 +0x67
20:32:49 ipfs[605040]: created by golang.org/x/sync/errgroup.(*Group).Go
20:32:49 ipfs[605040]:         golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:54 +0x8d

This is the master @ a72753bad

Here's the full log since I've installed the master:

full_log_ipfs.txt

RubenKelevra commented 1 year ago

@aschmahmann I set ipfs config --json Internal.UnixFSShardingSizeThreshold "\"1MB\"" so 1MB not 1GB, since this should work in theory.

But I still see 30 second delays for removing a single file in the MFS.

I think this was more due to large repining operations by the cluster daemon, as the MFS folders need to be pinned locally on every change.

I created a ticket on the cluster project for this.

Furthermore, I see (at least with a few file changes) no large hangs when using 1 MB sharding.

But I haven't yet tested the full import I had originally trouble with – and what this ticket is about.

RubenKelevra commented 1 year ago

@aschmahmann I can confirm this issue with the suggested ipfs config --json Internal.UnixFSShardingSizeThreshold "\"1MB\"" with a current master (a72753bad) as well as the current stable (0.12.2) from the ArchLinux repo.

(1 MB should never be exceeded on my datasets, as sharding wasn't necessary before to store the folders.)

The changes to the MFS are crunching to a hold after a lot of consecutive operations, where single ipfs files cp /ipfs/$CID /path/to/file commands take 1-2 minutes while the IPFS daemon is taking 4-6 cores worth of CPU power.

All other MFS operations are blocked as well, so you get response times in the minutes for simple ls operations.

@BigLep please reopen as this isn't fixed and can be reproduced

RubenKelevra commented 1 year ago

I'll take my project pacman.store with the package mirrors for Manjaro, Arch Linux etc. down until this is solved. I don't like running 0.9 anymore due to the age and would need to downgrade the whole server again.

I just cannot share days old packages, even weeks due to safety concerns, so I don't like to do any harm here.

The URLs will just return empty directories for now.

BigLep commented 1 year ago

@schomatis : are you able to take a look here and identify next steps?

RubenKelevra commented 1 year ago

I'm able to run everything you like on the machine to help, as it's basically out of commission now. :)

schomatis commented 1 year ago

I'll take a look tomorrow. I'm confused by this long thread which I couldn't parse in a first read and will need the OP to be updated if possible to reflect our current status. AFAICT the panic reported is not the problem, sharding also is not the problem, we still can't pinpoint this to any specific version and triggering the performance bottleneck requires GBs to be downloaded (please correct if necessary, but in the OP, not in the thread). Maybe someone running the GWs could share their experience running MFS with these sizes.

schomatis commented 1 year ago

@BigLep I can't add any value here, sorry.

The use case is pretty involved and unless we can reproduce this without the 500-line bash script (which is very different from "I've run this ipfs command and it's taking too long) I don't see how we can make any progress without sinking too much time here.

RubenKelevra commented 1 year ago

@schomatis I'm fine with helping out. Would a git bisect run help to see which commit introduced the issue?

RubenKelevra commented 1 year ago

@schomatis I'm fine with helping out. Would a git bisect run help to see which commit introduced the issue?

Btw: The bash script does basically boils down to a tight loop around

ipfs files stat /path/to/file, if it doesnt exist do a
ipfs add /path/to file and a
ipfs files cp /ipfs/$cid /path/to/file to add the file in the MFS.

From I can see from the outside is (however this is possible) the IPFS daemon seems to not complete the operation fully and returns too early. The tight loop which runs operations against the daemon will thus pile up not complete operations inside the daemon which do a lot of concurrent IO until everything is slow.

Jorropo commented 1 year ago

Would a git bisect run help to see which commit introduced the issue?

Yes certainly, there are two features we suspect to be the culprit but dont know which one precisely. (An mfd update or an ipld prime one)

RubenKelevra commented 1 year ago

@Jorropo okay, I'll report back when I find it. :)

Git bisect log so far:

```log git bisect start # bad: [67220edaaef4a938fe5fba85d793bfee59db3256] Merge pull request #8597 from ipfs/release-v0.11.0 git bisect bad 67220edaaef4a938fe5fba85d793bfee59db3256 # bad: [a72753bade90c4a48c29aba6c0dc81c44785e9d2] docs: fix abstractions typo git bisect bad a72753bade90c4a48c29aba6c0dc81c44785e9d2 # bad: [2a871ef0184da3b021209881dfa48e3cfdaa9d26] docs: add Snap note about customizing IPFS_PATH (#8584) git bisect bad 2a871ef0184da3b021209881dfa48e3cfdaa9d26 # bad: [2a871ef0184da3b021209881dfa48e3cfdaa9d26] docs: add Snap note about customizing IPFS_PATH (#8584) git bisect bad 2a871ef0184da3b021209881dfa48e3cfdaa9d26 # good: [0cdde038244ae344866d8b13d1678db56e56a87c] Merge pull request #7389 from ipfs/fix/refs-sessions git bisect good 0cdde038244ae344866d8b13d1678db56e56a87c # good: [7ce1d751f808d6a990dc496a81f2477742f9e640] Merge pull request #7501 from rafaelramalho19/chore/bump-webui-version git bisect good 7ce1d751f808d6a990dc496a81f2477742f9e640 # good: [b3e5ffc41ae4ef46402ff38be21c66912b59bc42] feat: add flag to ipfs key and list to output keys in b36/CIDv1 (#7531) git bisect good b3e5ffc41ae4ef46402ff38be21c66912b59bc42 # good: [2ed9254426e900cf00a9b35304dc5b5de8173208] Merge pull request #7817 from ipfs/chore/update-version git bisect good 2ed9254426e900cf00a9b35304dc5b5de8173208 # good: [7588a6a52a789fa951e1c4916cee5c7a304912c2] Merge pull request #7829 from ipfs/fix/pin-remote-service-ls-json git bisect good 7588a6a52a789fa951e1c4916cee5c7a304912c2 # good: [4d262b1f7325321a706451384dbeb87fd33d7b77] Merge pull request #7946 from ipfs/test/fixup-tests git bisect good 4d262b1f7325321a706451384dbeb87fd33d7b77 # good: [65d9507c3dcdf0ea0f95b5771eb8286f6a6e8879] Merge pull request #7953 from marten-seemann/fix-reqlog-race git bisect good 65d9507c3dcdf0ea0f95b5771eb8286f6a6e8879 # good: [c0ce56fa482c47856811656c003f7875e4a1fec2] Merge pull request #8001 from ipfs/fix/files-cp-docs git bisect good c0ce56fa482c47856811656c003f7875e4a1fec2 # good: [3f9c3f4557d326340a111a7c5fec3db345d175d1] Merge pull request #8021 from ipfs/dependabot/go_modules/github.com/ipfs/go-log-1.0.5 git bisect good 3f9c3f4557d326340a111a7c5fec3db345d175d1 # good: [041de2aed1a3a55d2897b02fea3bdc823b394cb1] fix: typo in migration error git bisect good 041de2aed1a3a55d2897b02fea3bdc823b394cb1 # bad: [a72753bade90c4a48c29aba6c0dc81c44785e9d2] docs: fix abstractions typo git bisect bad a72753bade90c4a48c29aba6c0dc81c44785e9d2 # bad: [67220edaaef4a938fe5fba85d793bfee59db3256] Merge pull request #8597 from ipfs/release-v0.11.0 git bisect bad 67220edaaef4a938fe5fba85d793bfee59db3256 # bad: [2a871ef0184da3b021209881dfa48e3cfdaa9d26] docs: add Snap note about customizing IPFS_PATH (#8584) git bisect bad 2a871ef0184da3b021209881dfa48e3cfdaa9d26 # bad: [deb79a258755b3623623ac62561d44451b9da472] chore: add release template snippet for fetching artifact tarball git bisect bad deb79a258755b3623623ac62561d44451b9da472 ```

github-actions[bot] commented 1 year ago

Oops, seems like we needed more information for this issue, please comment with more details or this issue will be closed in 7 days.

aschmahmann commented 1 year ago

@RubenKelevra might be worth checking if https://github.com/ipfs/go-ipfs/pull/9042 fixes the problem (it should be in master later today). There was a performance fix in go-unixfs regarding the automatic sharding (https://github.com/ipfs/go-unixfs/pull/120)

RubenKelevra commented 1 year ago

@RubenKelevra might be worth checking if https://github.com/ipfs/go-ipfs/pull/9042 fixes the problem (it should be in master later today). There was a performance fix in go-unixfs regarding the automatic sharding (https://github.com/ipfs/go-unixfs/pull/120)

Great, thanks. Will check it out as soon as I'm back at home :)

github-actions[bot] commented 1 year ago

Oops, seems like we needed more information for this issue, please comment with more details or this issue will be closed in 7 days.

RubenKelevra commented 1 year ago

@aschmahmann I can confirm this bug for 88d88158c

lidel commented 1 year ago

@RubenKelevra you mean it is still broken, even after the switch to go-unixfs v0.4.0?

I got the same panic report of this bug with v0.12.2 from a project in our community, they use MFS for adding data to existing dataset and once in a while get:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x18688c6]

goroutine 441888916 [running]:
github.com/ipfs/go-unixfs/hamt.(*Shard).childLinkType(0xc04e3ff420, 0x0, 0x40, 0x40, 0x41)
    github.com/ipfs/go-unixfs@v0.3.1/hamt/hamt.go:293 +0x26
github.com/ipfs/go-unixfs/hamt.(*Shard).walkChildren(0xc04e3ff420, 0xc037be7880, 0x0, 0x1, 0x1)
    github.com/ipfs/go-unixfs@v0.3.1/hamt/hamt.go:400 +0x27d
github.com/ipfs/go-unixfs/hamt.parallelShardWalk.func1(0x1, 0xc03f03fc00)
    github.com/ipfs/go-unixfs@v0.3.1/hamt/hamt.go:467 +0x107
golang.org/x/sync/errgroup.(*Group).Go.func1(0xc014813b30, 0xc0ee853f20)
    golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57 +0x59
created by golang.org/x/sync/errgroup.(*Group).Go
    golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:54 +0x66

Asked them for specific commands that are executed just before the crash, but suspect it is in the ballpark of https://github.com/ipfs/go-ipfs/issues/8694#issuecomment-1151083203 I've bumped this in priority and added to 0.14.

RubenKelevra commented 1 year ago

@RubenKelevra you mean it is still broken, even after the switch to go-unixfs v0.4.0?

Yeah. CPU load piles up and a simple ipfs files cp /ipfs/$CID /path/in/mfs takes minutes to complete.

I think there's just something running in concurrency and somehow work need to be done again and again to make the change to the MFS, as other parts of it are still changing. But that's just a guess. Could be anything else, really.

schomatis commented 1 year ago

Asked them for specific commands that are executed just before the crash

@lidel We might be conflating different topics in the same issue here, let's have a new issue for that report when it comes and please ping me.

lidel commented 1 year ago

@schomatis ack, moved panic investigation to https://github.com/ipfs/go-ipfs/issues/9063

dhyaniarun1993 commented 1 year ago

Hey Guys,

We are also facing the same issue here. Apart from upload, we are also facing this while doing ls the folder structure. Looks like MFS isn't able handle lot of small files and folders(we have around 40000 files and folder). After doing ls operation at different paths in the filesystem, MFS became really slow that it was taking more than 5 minutes to response to a simple ls query. After restarting the IPFS service the issue is fixed. Looks like there are performance issues with MFS. We have tested with IPFS version v0.11.0, v0.12.0 v0.13.0. We are running IPFS on flatFS storage.

Jorropo commented 1 year ago

@dhyaniarun1993 I am confident that this is an other issue (I couldn't find the issue so if you want open a new one even tho we know what this is). ipfs ls fetches block one by one, so it's a sequential process to read all the 40k files.

dhyaniarun1993 commented 1 year ago

Thanks @Jorropo for the explanation. So let's say I have following directory structure -> /abc -> /pqr -> /xyz

So when doing ls on /xyz, does IPFS has to go sequentially to it?

Jorropo commented 1 year ago

@dhyaniarun1993 I don't want to spam this issue so I'll mark our conversation off-topic FYI, pls open a new issue.

The resolution (walking from / to /xyz is fine enough). The issue is:

/xyz/0
/xyz/1
/xyz/2
/xyz/3
/xyz/4
/xyz/5

Kubo will fetch 0, 1, 2, 3, 4 and 5 one-by-one (instead of 32 by 32 in parallel for example).

EDIT: github wont let me hide my own messages ... :'(

CMB commented 1 year ago

I have this same issue. I'm maintaining a package mirror with approximately 400000 files. ipfs files cp gets progressively slower as files are added to MFS.

Here's the question. I already have a table of names and cids of all of the files. I track them in a database. Is there a way I could create the directory structure in one fell swoop from my list, rather than adding links to MFS one at a time?

BigLep commented 1 year ago

@Jorropo : what are the next steps here?

ipfs / kubo