hjmangalam / parsyncfp

follow-on to parsync (parallel rsync) with better startup perf
Other
161 stars 19 forks source link

Replace calls to ls and rm with perl functions. #42

Open rapier1 opened 3 years ago

rapier1 commented 3 years ago

This avoids issues when there are too many cache files for ls to process with a wildcard. While this doesn't happen all that often I've run into problems when moving very large data sets. Especially when they had files with widely varying sizes. I think this means some of the checks for too many cache files can be removed as well but I just wanted to submit the basics at this point. I haven't seen any notable performance issues even when processing 50,000 cache files.

I also removed trailing whitespace from some of the line (M-x delete-trailing-whitespace in emacs).

hjmangalam commented 3 years ago

Thanks very much Chris. Good points - a lot of the code was pulled together quite haphazardly (as you might have noticed). I'm ambivalent about getting rid of the fpart file number limits since too many fpart files have impacts on other bits of the filesystem and the churn when starting too many rsyncs. I'm finishing up some other work, but I'll try to merge these over the weekend. Harry

On Fri, Apr 9, 2021 at 9:42 AM Chris Rapier @.***> wrote:

This avoids issues when there are too many cache files for ls to process with a wildcard. While this doesn't happen all that often I've run into problems when moving very large data sets. Especially when they had files with widely varying sizes. I think this means some of the checks for too many cache files can be removed as well but I just wanted to submit the basics at this point. I haven't seen any notable performance issues even when processing 50,000 cache files.

I also removed trailing whitespace from some of the line (M-x delete-trailing-whitespace in emacs).

You can view, comment on, or merge this pull request online at:

https://github.com/hjmangalam/parsyncfp/pull/42 Commit Summary

  • Replace calls to ls and rm with perl functions.

File Changes

Patch Links:

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/hjmangalam/parsyncfp/pull/42, or unsubscribe https://github.com/notifications/unsubscribe-auth/AASF3Y53D35TFWA3EQEMW33TH4U7XANCNFSM42VJZ6WQ .

--

Harry Mangalam

rapier1 commented 3 years ago

Glad to be of help.

Just so you have an example - we used parsyncfp to move our primary data storage system to a new system. Something like 6 to 7 PB of data. Both filesystems were using lustre (which is its own issue). I wrote a wrapper so users could fire off their own runs of parsyncfp as slurm jobs. We ended up using an NP of 16 and a chunksize of -4G to override the cache limit. This was necessary as we had some users with more than 1PB of data. We dedicated 4 slurm node to these jobs and usually had 2 or 3 people per node. These are a beefy nodes with 64 cores and 128 threads and fully dedicated to parsyncfp tasks. We were seeing throughput peaking at 2500MB/s and averaging at 853MB/s (thought the media was probably closer to 1200). So I don't think we were seeing that much in the way of thrashing but we did have really good equipment for this. That said, the switch to disable the MAX_FPART works fine.

Chris

On 4/9/21 3:02 PM, Harry Mangalam wrote:

Thanks very much Chris. Good points - a lot of the code was pulled together quite haphazardly (as you might have noticed). I'm ambivalent about getting rid of the fpart file number limits since too many fpart files have impacts on other bits of the filesystem and the churn when starting too many rsyncs. I'm finishing up some other work, but I'll try to merge these over the weekend. Harry

On Fri, Apr 9, 2021 at 9:42 AM Chris Rapier @.***> wrote:

This avoids issues when there are too many cache files for ls to process with a wildcard. While this doesn't happen all that often I've run into problems when moving very large data sets. Especially when they had files with widely varying sizes. I think this means some of the checks for too many cache files can be removed as well but I just wanted to submit the basics at this point. I haven't seen any notable performance issues even when processing 50,000 cache files.

I also removed trailing whitespace from some of the line (M-x delete-trailing-whitespace in emacs).

You can view, comment on, or merge this pull request online at:

https://github.com/hjmangalam/parsyncfp/pull/42 Commit Summary

  • Replace calls to ls and rm with perl functions.

File Changes

  • M parsyncfp

https://github.com/hjmangalam/parsyncfp/pull/42/files#diff-73be673404f947b738e0d81bf0a44761fb097919372f259198c6b5fb4c5f9a17 (270)

Patch Links:

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/hjmangalam/parsyncfp/pull/42, or unsubscribe

https://github.com/notifications/unsubscribe-auth/AASF3Y53D35TFWA3EQEMW33TH4U7XANCNFSM42VJZ6WQ .

--

Harry Mangalam

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/hjmangalam/parsyncfp/pull/42#issuecomment-816895861, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKL66BQMMUXAZAGYAJFDBDTH5FOHANCNFSM42VJZ6WQ.

hjmangalam commented 3 years ago

Hi Chris, Your patches are lighter weight than the system calls I had, so I'll include them going forward, but I'm still concerned about eliminating the $MAX_FPART_FILES simply bc a novice will use a value that generates literally 10s of 1000s of them and unless I'm missing something, that's not something you want, since starting up a bazillion rsyncs takes time as well. There's a tradeoff between early starts (lots of tiny fpart chunks) and late starts (smaller numbers of larger fpart chunks). In fact this is something I'll mention to Ganael (fpart's author) - can fpart be told to chunk X number of small files (for startup) and then switch to larger chunks fo the main run.

Also, I've gotten the multihost version running and after some testing on a fast net, I'll probably be releasing it within a week - it's a major reworking of the code so I'll include your code, but not as a simple pull.

Also, did you get a chance to look at the RoundRobin changes I suggested? Did any of them work the way you wanted?

Best wishes and thanks for your contribution to pfp. Harry

On Fri, Apr 9, 2021 at 12:39 PM Chris Rapier @.***> wrote:

Glad to be of help.

Just so you have an example - we used parsyncfp to move our primary data storage system to a new system. Something like 6 to 7 PB of data. Both filesystems were using lustre (which is its own issue). I wrote a wrapper so users could fire off their own runs of parsyncfp as slurm jobs. We ended up using an NP of 16 and a chunksize of -4G to override the cache limit. This was necessary as we had some users with more than 1PB of data. We dedicated 4 slurm node to these jobs and usually had 2 or 3 people per node. These are a beefy nodes with 64 cores and 128 threads and fully dedicated to parsyncfp tasks. We were seeing throughput peaking at 2500MB/s and averaging at 853MB/s (thought the media was probably closer to 1200). So I don't think we were seeing that much in the way of thrashing but we did have really good equipment for this. That said, the switch to disable the MAX_FPART works fine.

Chris

On 4/9/21 3:02 PM, Harry Mangalam wrote:

Thanks very much Chris. Good points - a lot of the code was pulled together quite haphazardly (as you might have noticed). I'm ambivalent about getting rid of the fpart file number limits since too many fpart files have impacts on other bits of the filesystem and the churn when starting too many rsyncs. I'm finishing up some other work, but I'll try to merge these over the weekend. Harry

On Fri, Apr 9, 2021 at 9:42 AM Chris Rapier @.***> wrote:

This avoids issues when there are too many cache files for ls to process with a wildcard. While this doesn't happen all that often I've run into problems when moving very large data sets. Especially when they had files with widely varying sizes. I think this means some of the checks for too many cache files can be removed as well but I just wanted to submit the basics at this point. I haven't seen any notable performance issues even when processing 50,000 cache files.

I also removed trailing whitespace from some of the line (M-x delete-trailing-whitespace in emacs).

You can view, comment on, or merge this pull request online at:

https://github.com/hjmangalam/parsyncfp/pull/42 Commit Summary

  • Replace calls to ls and rm with perl functions.

File Changes

  • M parsyncfp

< https://github.com/hjmangalam/parsyncfp/pull/42/files#diff-73be673404f947b738e0d81bf0a44761fb097919372f259198c6b5fb4c5f9a17

(270)

Patch Links:

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/hjmangalam/parsyncfp/pull/42, or unsubscribe

< https://github.com/notifications/unsubscribe-auth/AASF3Y53D35TFWA3EQEMW33TH4U7XANCNFSM42VJZ6WQ

.

--

Harry Mangalam

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/hjmangalam/parsyncfp/pull/42#issuecomment-816895861, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AAKL66BQMMUXAZAGYAJFDBDTH5FOHANCNFSM42VJZ6WQ .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hjmangalam/parsyncfp/pull/42#issuecomment-816925134, or unsubscribe https://github.com/notifications/unsubscribe-auth/AASF3YYNRGAATGZGBFHVMDLTH5JYXANCNFSM42VJZ6WQ .

--

Harry Mangalam