Closed docmeth02 closed 10 years ago
I will be testing out what is outlined in this comment on SO to fix this. I am not sure using a larger chunk_size is the best way to go.
I pushed a great deal of fixes. Included in that was the comment I linked to. I have thoroughly tested this code on 1 large file and a bunch of smaller files around 40MB and have not exceeded 13% cpu so I believed this should be fixed by 686bfb0.
Actually its even worse in the latest update. Now on top of calling write and flush every kb you also call fsync which explicitly tells the os to write every single kb to disk. The increased I/O activity adds even more to the resource hogging. The task this application tries to archive is trivial for fairly recent computers, it should at no point use more than one percent of cpu per thread.
That being said, i've been running the old version with the flush() call removed and chunk_size increased to 512kb for 24 hours. 5 Threads pulled 800gb from bitcasa using 40mb of ram and never exceeding 1.5 percent cpu usage.
Have you checked the file integrity? Have you had any issues with corruption? That was the biggest thing I was worried about and why I never "released" this in the first place. When using bitcasa's api anyway to download in chunks my files would get corrupted.
I tested version c663c0733ca63e835507f7a6a3305de30dd9efdf and it seems that there is a substantial improvement in cpu usage as you mentioned. The fallback if the size doesn't match should take care of the integrity issues I am worried about. If you find that it continues to have high cpu issue please let me know.
The code calls write() and flush() for every kilobyte downloaded. I did some testing and by raising the chunk_size to a megabyte (maybe a little too high but im sure you figure out a good value) the cpu usage was reduced to a tenth of what it originally was.