Add ability to specify a file with CVARs

wesbland commented 6 years ago

As the number of CVARs has started ballooning, particularly with the new collectives features, managing these with environment variables on the command line is getting more and more difficult. It'd be great to be able to pass in a file with all of the CVARs so it would be easy to replicate later.

hzhou commented 5 years ago

Can we simply use a shell script?

Or we could also consider creating a wrapper script that reads in the config file setting CVARs and then call mpiexec.

Either way is a discrete user solution that does not require infrastructure change.

wesbland commented 5 years ago

I was thinking something along the lines of what Open MPI calls an MCA parameters file. Basically just a place where you can put a bunch of CVARs, environment variables, etc. that can be set up at mpiexec time.

You're right that this can be managed by a shell scrip that wraps up mpiexec, but I was thinking that this is the kind of feature that would be nice for setting up easy "default configurations" for a machine.

pavanbalaji commented 5 years ago

No, I don't think using a shell script is a good idea as it ties to the process manager. It'll not work is someone chooses to use a different process manager (e.g., SLURM or Cray aprun). Having a config file like what @wesbland suggested, which can be read at MPI_Init time is better.

hzhou commented 5 years ago

@pavanbalaji I see your point.

If the goal is to have per-system parameters, the system administrator will be involved anyway and I think integrating the shell script with whatever their system -- including how to work with their process manager -- is the lessor barrier.

To implement at MPI_Init, we need implement config parsing code in C and publish and maintain a config DSL. While it can be very basic -- just VAR = value -- at the beginning, as the experience from JSON and YAML shows, it is likely to grow complex very quickly. If we adopt JSON or YAML, for one, it starts with a complex foundation, for two adds a dependency to MPICH. I am not giving a no-vote, but it is something we need consider.

I believe @wesbland is thinking of an implementation at mpiexec layer. It does not automatically transfer to other process manager, but at least I like the separation from MPI.

wesbland commented 5 years ago

If the goal is to have per-system parameters, the system administrator will be involved anyway and I think integrating the shell script with whatever their system -- including how to work with their process manager -- is the lessor barrier.

I think if you ask the system administrators, they might say different. 😁

My point was that there might be multiple valid configurations that will change depending on your application behavior. For instance (since this is in the context of collectives), if you know your application is heavy on a particular message size/node configuration/whatever, you might want to tell it to use a particular algorithm all the time. That's not the kind of thing that can be set up once and left alone.

To implement at MPI_Init, we need implement config parsing code in C and publish and maintain a config DSL. While it can be very basic -- just VAR = value -- at the beginning, as the experience from JSON and YAML shows, it is likely to grow complex very quickly. If we adopt JSON or YAML, for one, it starts with a complex foundation, for two adds a dependency to MPICH. I am not giving a no-vote, but it is something we need consider.

We can probably use a simplified version of what they have in Open MPI:

https://github.com/open-mpi/ompi/blob/4944508603f26c7881696315bd35eda64a86130c/opal/mca/base/mca_base_open.c

I believe @wesbland is thinking of an implementation at mpiexec layer. It does not automatically transfer to other process manager, but at least I like the separation from MPI.

I was specifically suggesting an mpiexec flag, yes, but @pavanbalaji makes a valid point that there might be a different process manager so there needs to be another way to make sure the use can give it to MPICH.

hzhou commented 5 years ago

My point was that there might be multiple valid configurations that will change depending on your application behavior.

It is easy to create shell scripts wrapping around the actual application prog, right? With custom per-app config, one still need do extra to activate/deactivate the config upon switching apps. The way to manage that is probably is to use shell scripts.

pavanbalaji commented 5 years ago

If the goal is to have per-system parameters, the system administrator will be involved anyway and I think integrating the shell script with whatever their system -- including how to work with their process manager -- is the lessor barrier.

I think if you ask the system administrators, they might say different. 😁

I agree with Wesley. No system administrator that I know of is going to agree to do this. Plus, some process managers are closed source (e.g., Cray).

pavanbalaji commented 5 years ago

My point was that there might be multiple valid configurations that will change depending on your application behavior.

It is easy to create shell scripts wrapping around the actual application prog, right? With custom per-app config, one still need do extra to activate/deactivate the config upon switching apps. The way to manage that is probably is to use shell scripts.

Getting user buy-in to yet-another launch environment is going to be a nightmare. Plus every system will need to document its usage the way they do with mpiexec or aprun or whatever they use on the system. What you are suggesting is not practically usable.

Now that we have argued against your point, let's get back to see what your objections with the INIT-time reading of the config file are. It's fairly trivial code to do this, amounting to less than 20-30 lines of code. Are we really calling that complex? I'll be happy to write that code up if that's the concern.

wesbland commented 5 years ago

Getting user buy-in to yet-another launch environment is going to be a nightmare. Plus every system will need to document its usage the way they do with mpiexec or aprun or whatever they use on the system. What you are suggesting is not practically usable.

I think @hzhou's point was that you could do something like this:

mpiexec -n 1000 ./my_cvar_wrapper.sh my_app arg1 arg2 arg3

Where my_cvar_wrapper.sh could be

#!/bin/bash

MPIR_CVAR_FOO=bar MPIR_CVAR_BAZ=fizzbuzz $*

I guess that's also a solution. 🤷‍♂️

pavanbalaji commented 5 years ago

What happens when I do this:

mpiexec -n 1000 -genv MPIR_CVAR_FOO=blob ./my_cvar_wrapper.sh my_app arg1 arg2 arg2

I don't think that's a good solution.

hzhou commented 5 years ago

Seems there are some confusion. The config file we are talking about is merely a set of environment variables? That is what I meant the system administrator should have no problem figuring out how to enable a set of default environment variables for their system -- probably do not even need direct touching mpiexec or slurm. Similarly, the user writing a shell script wrapping around their app is being exercised everywhere already -- isn't that how they submit jobs? Adding env variables to that script seems no-brainer to me. Did I miss anything?

hzhou commented 5 years ago

What happens when I do this:
mpiexec -n 1000 -genv MPIR_CVAR_FOO=blob ./my_cvar_wrapper.sh my_app arg1 arg2 arg2
I don't think that's a good solution.

The concern is the overhead of shell scripts?

pavanbalaji commented 5 years ago

Seems there are some confusion. The config file we are talking about is merely a set of environment variables? That is what I meant the system administrator should have no problem figuring out how to enable a set of default environment variables for their system -- probably do not even need direct touching mpiexec or slurm. Similarly, the user writing a shell script wrapping around their app is being exercised everywhere already -- isn't that how they submit jobs? Adding env variables to that script seems no-brainer to me. Did I miss anything?

Users don't wrap their executables with shell scripts. That's a major assumption.

The concern is the overhead of shell scripts?

The concern is that you lost my explicitly set environment information.

Once again, please let's get back to the config file. We have argued about your solution for a while now, and we need to get back to what's wrong with an init-time readable config file. Why is that not a solution?

hzhou commented 5 years ago

Once again, please let's get back to the config file. We have argued about your solution for a while now, and we need to get back to what's wrong with an init-time readable config file. Why is that not a solution?

I have not really understood your reasons against shell scripts yet, for the second part -- A config file is a DSL. I assume you are thinking basic 'VAR=value' syntax? How about quoting? How about invalid character? How about error behaviors? Then, once you have a config file in place, it is natural to desire additional functionality -- such as default info hints? DSL alway grows unless it is not interfacing user. So I think there is potential complexity.

pavanbalaji commented 5 years ago

Once again, please let's get back to the config file. We have argued about your solution for a while now, and we need to get back to what's wrong with an init-time readable config file. Why is that not a solution?

I have not really understood your reasons against shell scripts yet, for the second part -- A config file is a DSL. I assume you are thinking basic 'VAR=value' syntax? How about quoting? How about invalid character? How about error behaviors? Then, once you have a config file in place, it is natural to desire additional functionality -- such as default info hints? DSL alway grows unless it is not interfacing user. So I think there is potential complexity.

We are talking about CVARs, not generic user environment variables. We don't have as much complexity as you make it sound.

We already explained our reasons against shell scripts in the previous messages.

hzhou commented 5 years ago

Getting user buy-in to yet-another launch environment is going to be a nightmare. Plus every system will need to document its usage the way they do with mpiexec or aprun or whatever they use on the system. What you are suggesting is not practically usable.

I think @hzhou's point was that you could do something like this:
mpiexec -n 1000 ./my_cvar_wrapper.sh my_app arg1 arg2 arg3
Where my_cvar_wrapper.sh could be
#!/bin/bash

MPIR_CVAR_FOO=bar MPIR_CVAR_BAZ=fizzbuzz $*
I guess that's also a solution. 🤷‍♂️

Since it is a per-app configuration, I was more thinking of app.sh:

export MPIR_CVAR_FOO=bar 
export MPIR_CVAR_BAZ=fizzbuzz
/path/to/app $*

Will this work?

hzhou commented 3 years ago

@wesbland Can this issue be closed?

wesbland commented 3 years ago

If you like. I still think it would be a valuable feature.

hzhou commented 3 years ago

https://github.com/pmodels/mpich/issues/2878#issuecomment-478625939 What would be the difference between what proposed and what described in this comment?

wesbland commented 3 years ago

That's fine too. Pavan raised a good point that whatever we do won't be portable in the places that it matters most. I guess I would be more likely to advocate for centers to set some default environment variables that could be overwritten (e.g., ALCF setting some default QMPI tools or having a default set of collective algorithms).

hzhou commented 3 years ago

I guess I am not quite clear on which case "that it matters most".

CVAR should be mainly used for debugging purposes or at least always be set by users explicitly. I imagine CVARs set by system admins can be very frustrating for users as they will see different behaviors from system to system and even to their own testing and have no clue on what is going on.

That said, there is nothing preventing system admins to pre-load environment variables in their module files. In fact, they do that all the time. Users are always able to overwrite any of the environment variables set by admins. In reality, it is tricky for users to know what, when, and why to override any of them.

wesbland commented 3 years ago

CVAR should be mainly used for debugging purposes or at least always be set by users explicitly. I imagine CVARs set by system admins can be very frustrating for users as they will see different behaviors from system to system and even to their own testing and have no clue on what is going on.

Maybe, but not all of the time. For instance, QMPI will rely on environment variables to specify which tools to load. Some of the multi-nic features can be enabled/disabled by environment variables. Collective algorithms can be selected via environment variables (though they also have a separate JSON file system, so it's not a perfect example.

That said, there is nothing preventing system admins to pre-load environment variables in their module files. In fact, they do that all the time. Users are always able to overwrite any of the environment variables set by admins. In reality, it is tricky for users to know what, when, and why to override any of them.

I agree. This is a power-user feature. Anyway, I think we've agreed that we're not going to do this so I'm going to close it.

pmodels / mpich

Add ability to specify a file with CVARs #2878