Open jdgcs opened 7 years ago
% ./ipfs repo stat NumObjects 9996740 RepoSize 381559053765 RepoPath /home/liu/.ipfs Version fs-repo@4
I encounter this too.
Hey @FortisFortuna, yes, this is a common issue with the default flatfs
datastore (which basically stores each 256K-chunk of each file being added in a different file in the repository and ends up collapsing the filesystem), could you try the badgerds
datatsore and see if it helps? (Initialize the repository with the --profile=badgerds
option).
thank you
ipfs config profile apply badgerds
ipfs-ds-convert convert
I have about 18 GB and 500K files (Everipedia) on the default flatfs. Do these commands convert the blocks from flatfs to badgerds so I don't have to do everything over again?
Yes. However, it may be faster to do it over as this will still have to extract the blocks from flatfs and move them into badgerds.
Yes, keep in mind the conversion tool will require twice the size of the repo being converted.
Also, I'd be interested in how big your datastore gets with badgerds.
I am unable to build the conversion tool. It stalls for me on the make
inside ipfs-ds-convert
[0 / 22]
Looks like you're having trouble fetching the dependencies. Try building https://github.com/ipfs/ipfs-ds-convert/pull/11.
Ok thanks! The pull request you did let me build it. I will follow the instructions in this thread and #5013 now and try to convert the db (I backed up the flatfs version just in case). Thanks for the quick reply.
Works, but still the same slow speed
@FortisFortuna That's strange, I would definitely expect a speed up when using Badger instead of the flat datastore when adding files, I mean I can't say that it would be fast, but it should be noticeably faster than you're previous setup.
Could you, as a test, initialize a new repo with the --profile=badgerds
option and add a small sample of your data set (say 30GB) to check if you experience different speeds when writing than with flatfs
. (Badger's performance may degrade with bigger data sets but not to the point of being comparable with flatfs
so this test should be representative enough to check that you're setting everything properly on your end, and in that case we should investigate further on our -or Badger's- side.)
Hm. Actually, this may be pins. Are you adding one large directory or a bunch of individual files? Our pin logic is really optimized at the moment so if you add all the files individually, you'll end up with many pins and performance will be terrible.
Everipedia has around 6 million pages, and I have IPFS'd about 710K of them in the past week on a 32 core 252G RAM machine. Something is bottlenecking because I am only getting about 5-10 hashes a second. I know for a fact the bottleneck is the ipfs add
in the code. The machine isn't even running near full capacity.
I am using this:
https://github.com/ipfs/py-ipfs-api
import ipfsapi api = ipfsapi.connect('127.0.0.1', 5001) res = api.add('test.txt')
Specifically, a gzipped html file of average size ~15 kB is being added each loop.
Ah. Yeah, that'd do it. We're trying to redesign how we do pins but that's currently under discussion.
So, the best way to deal with this is to just add the files all at once with ipfs add -r
. Alternatively, you can disable garbage collection (don't run the daemon with the --enable-gc
flag) and just add the files without pinning them (use pin=False
).
i will try pin=False. I need to keep track of which files get which hashes through, so I don't think I can simple pre-generate the html files, then add, unless you know a way.
If I skip the pinning, will I still be able to ipfs cat
them?
Once you've added a directory, you can get the hashes of the files in the directory by running either:
ipfs files stat --hash /ipfs/DIR_HASH/path/to/file
to get the hash of an individual file.ipfs ls /ipfs/DIR_HASH
to list the hashes/names of all the files in a directory.Note: If you're adding a massive directory, you'll need to enable (directory sharding](https://github.com/ipfs/go-ipfs/blob/master/docs/experimental-features.md#directory-sharding--hamt) (which is an experimental feature).
Thanks
So to clarify, if I put pin=False
, I can still retrieve / cat
the files right, as long as I keep garbage collection off? I noticed a gradual degradation in file addition speed as more files were added.
You are a god among men @Stebalien. Setting pin=False int the Python script did it! To summarize
1) Using badgerrds
2) using ---offline
3) ipfs config Reprovider.Strategy roots
4) ipfs config profile apply server
5) set pin=False when ipfs add
-ing in my Python script.
Getting like 25 hashes a second now vs 3-5 before. ipfs cat
works too
Hey guys. The IPFS server was working fine with the above helper options, until I needed to restart. When I did, the daemon tries to initialize, but freezes. I am attempting to update from 0.4.15 to 0.4.17 to see if that helps, but now it stalls on "applying 6-to-7 repo migration". I have over 1 million IPFS hashes (everipedia.org). Anything I am doing wrong?
I see this in the processes "/tmp/ipfs-update-migrate590452412/fs-repo-migrations -to 7 -y" could it be I/O limitations?
Ok, so the migration did eventually finish, but it took a while (~ 1 hr). Once the update went through, the daemon started fast. It is working now.
So, that migration should have been blazing fast. It may have been that "daemon freeze". That is, the migration literally uses the 0.4.15 repo code to load the repo for the migration.
It may also have been the initial repo size computation. We've switched to memoizing the repo size as it's expensive to compute for large repos but we have to compute it up-front so that might have delayed your startup.
ipfs add -r 289GB( average file size < 10MB) after add 70GB, speed noticeable slowing done, spend 2 days to reach 200GB,
Do you means to speed up by (go-ipfs v0.4.18) ipfs add pin=false -r xxxxxxxxx ?
Is this right?
@hoogw please report a new issue.
ipfs add (by default --pin = true)
To turn off pin to speed up,
D:\test>ipfs add --pin=false IMG_1427.jpg 4.18 MiB / 4.18 MiB [========================================================================================] 100.00%added QmekTFtiQqrhiqms8FXZqPD1TfMc9kQUoNF8WVUNBGJF8h IMG_1427.jpg 4.18 MiB / 4.18 MiB [========================================================================================] 100.00% D:\test>
Version information:
% ./ipfs version --all
go-ipfs version: 0.4.4- Repo version: 4 System version: amd64/freebsd Golang version: go1.7
Type:
./ipfs add became very slow when handling large number of files.
Priority:P1
Description:
./ipfs add became very slow when handling about 45K files(~300GB), it took about 3+ seconds to wait after the process bar finished.
But we can run several IPFS in the same machine to deal with this issue.
About the machine: CPU:E3-1230V2, RAM:16G, storage:8T with 240G SSD cache@ZFS
Thanks for the amazing project!