grenaud / gargammel

gargammel is an ancient DNA simulator
GNU General Public License v3.0
25 stars 15 forks source link

gzip in gargammel.pl causing filesystem issues #12

Open cmeesters opened 3 years ago

cmeesters commented 3 years ago

Hi,

gargammel.pl contains a few | gzip commands. This can be problematic (and has been on our cluster: it caused very low overall throughput):

Generally, it is a good idea to compress final results, but not intermediate files (or to do so independently of an actual calculation).

Would you consider to change the wrapper script accordingly?

Best regards, Chris

grenaud commented 3 years ago

usually, users report having disk capacity issues. This is why intermediate files are zipped. I am not sure but I think that gzip is almost never the bottleneck.

However, if you want to add a flag and write a pull request, I would gladly accept it :-) Sorry if I cannot really do much work now, I have 1-2 h per day to answer emails mostly from my students.

On Thu, Apr 15, 2021 at 11:09 AM Christian Meesters < @.***> wrote:

Hi,

gargammel.pl contains a few | gzip commands. This can be problematic (and has been on our cluster: it caused very low overall throughput):

  • filesystems do not cope well with repeated small requests. This might not be noted on a "small" filesystem and a "small" computer (e.g. a "big" server), but on huge parallel file system (e.g. of a cluster) this can be a performance killer (for gargammel-users and other users). The impact might only be noted, when many such processes run concurrently.
  • in addition to this it is to be noted that compression usually is slow and piping to gzip within the same cgroup will limit the scalability of the parent.

Generally, it is a good idea to compress final results, but not intermediate files (or to do so independently of an actual calculation).

Would you consider to change the wrapper script accordingly?

Best regards, Chris

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/grenaud/gargammel/issues/12, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQRNI4RSX7Y5SCFUMCLIMLTI2UNJANCNFSM427BSHWQ .