saga-project / bliss

ATTENTION: bliss is now saga-python. Please check out the new project website for the latest version: http://saga-project.github.com/saga-python/
http://saga-project.github.com/saga-python/
Other
8 stars 4 forks source link

SGE Plugin hardcodes to TACC parallel environments #87

Open drelu opened 11 years ago

drelu commented 11 years ago

Make parallel environment for SGE configurable so that machines other than Lonestar can use the SGE plugin. On Morar e.g. the following parallel environments are supported:

make mpi mpi-128 omp omp-128 smp

oleweidner commented 11 years ago

Hi Andre,

can you please elaborate how these environments are configured if you use them via plain SGE (i.e., without Bliss). Is that something that goes into the job script? Do you suggest that 'mpi, mpi-128, omp,…' should be options for the the saga 'jd.spmd_variation' field?

I'm working on the SGE adaptor at the moment, so the timing is perfect ;-)

Thanks, Ole

On Feb 19, 2013, at 23:38 , Andre Luckow notifications@github.com wrote:

Make parallel environment for SGE configurable so that machines other than Lonestar can use the SGE plugin. On Morar e.g. the following parallel environments are supported:

make mpi mpi-128 omp omp-128 smp

— Reply to this email directly or view it on GitHub.

drelu commented 11 years ago

Hi Ole, the available pe can be queried via:

qconf -spl 10way 11way 12way 1way 24way 2way 4way 6way 8way

Since it is a string, it should be provided by the user.

Thanks!

Best, Andre

On Wed, Feb 20, 2013 at 2:23 AM, Ole Weidner notifications@github.comwrote:

Hi Andre,

can you please elaborate how these environments are configured if you use them via plain SGE (i.e., without Bliss). Is that something that goes into the job script? Do you suggest that 'mpi, mpi-128, omp,…' should be options for the the saga 'jd.spmd_variation' field?

I'm working on the SGE adaptor at the moment, so the timing is perfect ;-)

Thanks, Ole

On Feb 19, 2013, at 23:38 , Andre Luckow notifications@github.com wrote:

Make parallel environment for SGE configurable so that machines other than Lonestar can use the SGE plugin. On Morar e.g. the following parallel environments are supported:

make mpi mpi-128 omp omp-128 smp

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHubhttps://github.com/saga-project/bliss/issues/87#issuecomment-13819202.

oleweidner commented 11 years ago

Hi Andre,

how's 'pe' related to the previously mentioned 'mpi, mpi-128, omp,…'?

How do you think this should be mapped to saga?

Thanks, Ole

On Feb 20, 2013, at 13:14 , Andre Luckow notifications@github.com wrote:

Hi Ole, the available pe can be queried via:

qconf -spl 10way 11way 12way 1way 24way 2way 4way 6way 8way

Since it is a string, it should be provided by the user.

Thanks!

Best, Andre

On Wed, Feb 20, 2013 at 2:23 AM, Ole Weidner notifications@github.comwrote:

Hi Andre,

can you please elaborate how these environments are configured if you use them via plain SGE (i.e., without Bliss). Is that something that goes into the job script? Do you suggest that 'mpi, mpi-128, omp,…' should be options for the the saga 'jd.spmd_variation' field?

I'm working on the SGE adaptor at the moment, so the timing is perfect ;-)

Thanks, Ole

On Feb 19, 2013, at 23:38 , Andre Luckow notifications@github.com wrote:

Make parallel environment for SGE configurable so that machines other than Lonestar can use the SGE plugin. On Morar e.g. the following parallel environments are supported:

make mpi mpi-128 omp omp-128 smp

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHubhttps://github.com/saga-project/bliss/issues/87#issuecomment-13819202.

— Reply to this email directly or view it on GitHub.

drelu commented 11 years ago

Hi, there is no right way to map this to SAGA. AndreM will hit me for this, but I think an SGE specific extension attribute is the right place for this way of specifying a parallel environment.

Best, Andre

On Wed, Feb 20, 2013 at 7:28 AM, Ole Weidner notifications@github.comwrote:

Hi Andre,

how's 'pe' related to the previously mentioned 'mpi, mpi-128, omp,…'?

How do you think this should be mapped to saga?

Thanks, Ole

On Feb 20, 2013, at 13:14 , Andre Luckow notifications@github.com wrote:

Hi Ole, the available pe can be queried via:

qconf -spl 10way 11way 12way 1way 24way 2way 4way 6way 8way

Since it is a string, it should be provided by the user.

Thanks!

Best, Andre

On Wed, Feb 20, 2013 at 2:23 AM, Ole Weidner notifications@github.comwrote:

Hi Andre,

can you please elaborate how these environments are configured if you use them via plain SGE (i.e., without Bliss). Is that something that goes into the job script? Do you suggest that 'mpi, mpi-128, omp,…' should be options for the the saga 'jd.spmd_variation' field?

I'm working on the SGE adaptor at the moment, so the timing is perfect ;-)

Thanks, Ole

On Feb 19, 2013, at 23:38 , Andre Luckow notifications@github.com wrote:

Make parallel environment for SGE configurable so that machines other than Lonestar can use the SGE plugin. On Morar e.g. the following parallel environments are supported:

make mpi mpi-128 omp omp-128 smp

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub< https://github.com/saga-project/bliss/issues/87#issuecomment-13819202>.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHubhttps://github.com/saga-project/bliss/issues/87#issuecomment-13829283.

andre-merzky commented 11 years ago

Hi Andre,

On Thu, Feb 21, 2013 at 1:34 AM, Andre Luckow notifications@github.com wrote:

Hi, there is no right way to map this to SAGA. AndreM will hit me for this,

Hmm, tempting... ;-)

but I think an SGE specific extension attribute is the right place for this way of specifying a parallel environment.

Well, if it is needed, its needed... -- but is there a way to simply encode this in an existing attribute, like

NUMBER_OF_PROCESSES = "24@4way" SPMD_VARIATION = "MPI+OMP-128"

I don't really yet understand what those attributes are supposed to specify, so the above examples are probably stupid -- but I think you see what I am asking?

Best, Andre.

Best, Andre

On Wed, Feb 20, 2013 at 7:28 AM, Ole Weidner notifications@github.comwrote:

Hi Andre,

how's 'pe' related to the previously mentioned 'mpi, mpi-128, omp,…'?

How do you think this should be mapped to saga?

Thanks, Ole

On Feb 20, 2013, at 13:14 , Andre Luckow notifications@github.com wrote:

Hi Ole, the available pe can be queried via:

qconf -spl 10way 11way 12way 1way 24way 2way 4way 6way 8way

Since it is a string, it should be provided by the user.

Thanks!

Best, Andre

On Wed, Feb 20, 2013 at 2:23 AM, Ole Weidner notifications@github.comwrote:

Hi Andre,

can you please elaborate how these environments are configured if you use them via plain SGE (i.e., without Bliss). Is that something that goes into the job script? Do you suggest that 'mpi, mpi-128, omp,…' should be options for the the saga 'jd.spmd_variation' field?

I'm working on the SGE adaptor at the moment, so the timing is perfect ;-)

Thanks, Ole

On Feb 19, 2013, at 23:38 , Andre Luckow notifications@github.com wrote:

Make parallel environment for SGE configurable so that machines other than Lonestar can use the SGE plugin. On Morar e.g. the following parallel environments are supported:

make mpi mpi-128 omp omp-128 smp

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub< https://github.com/saga-project/bliss/issues/87#issuecomment-13819202>.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHubhttps://github.com/saga-project/bliss/issues/87#issuecomment-13829283.

— Reply to this email directly or view it on GitHub.

There are only two hard things in Computer Science: cache invalidation and naming things.

-- Phil Karlton

drelu commented 11 years ago

I think an SGE specific extension attribute is the right place for this way of specifying a parallel environment.

Well, if it is needed, its needed... -- but is there a way to simply encode this in an existing attribute, like

NUMBER_OF_PROCESSES = "24@4way" SPMD_VARIATION = "MPI+OMP-128"

I don't really yet understand what those attributes are supposed to specify, so the above examples are probably stupid -- but I think you see what I am asking?

Introducing an awkward way of overload existing attributes just to avoid touching the SAGA spec (which is very inflexible and does not allow any extension!!). Job description attributes should be extensible!

The focus should be on the user and how to make it simple and straightforward for him! It is difficult enough to mentally map resource specific commands to SAGA (see Melissa's question with respect to ppn!). The hell does not freeze over just because of an resource specific attribute. Encoding resource specifics into a string like number_of_processes (which is usually a number) is just a big hack and certainly not something SAGA envisioned with providing a unified abstraction.

Best, Andre

andre-merzky commented 11 years ago

On Thu, Feb 21, 2013 at 1:52 AM, Andre Luckow notifications@github.comwrote:

I think an SGE specific extension attribute is the right place for this way of specifying a parallel environment.

Well, if it is needed, its needed... -- but is there a way to simply encode this in an existing attribute, like

NUMBER_OF_PROCESSES = "24@4way" SPMD_VARIATION = "MPI+OMP-128"

I don't really yet understand what those attributes are supposed to specify, so the above examples are probably stupid -- but I think you see what I am asking?

Introducing an awkward way of overload existing attributes just to avoid touching the SAGA spec (which is very inflexible and does not allow any extension!!). Job description attributes should be extensible!

Ah, please read again -- I was asking if there exists a simple way to encode that information... Reason is this time not so much the spec, but that things are sufficiently confusing with the number of processor-assignment attributes we already have -- adding yet another one which will only be usable for one specific backend won't make usage any simpler...

The focus should be on the user and how to make it simple and

straightforward for him! It is difficult enough to mentally map resource specific commands to SAGA (see Melissa's question with respect to ppn!). The hell does not freeze over just because of an resource specific attribute. Encoding resource specifics into a string like number_of_processes (which is usually a number) is just a big hack and certainly not something SAGA envisioned with providing a unified abstraction.

Yes, the goal is simple usage. Funny that we have the same intentions, and yet disagree so much on the means, isn't it :-)

Thanks, Andre.

Best, Andre

— Reply to this email directly or view it on GitHubhttps://github.com/saga-project/bliss/issues/87#issuecomment-13866595.

There are only two hard things in Computer Science: cache invalidation and naming things.

-- Phil Karlton

drelu commented 11 years ago

Hi Andre,

Introducing an awkward way of overload existing attributes just to avoid

touching the SAGA spec (which is very inflexible and does not allow any extension!!). Job description attributes should be extensible!

Ah, please read again -- I was asking if there exists a simple way to encode that information... Reason is this time not so much the spec, but that things are sufficiently confusing with the number of processor-assignment attributes we already have -- adding yet another one which will only be usable for one specific backend won't make usage any simpler...

No, there is no simple way. The admin can define any arbitrary string, on one machine a parallel environment might be called "mpi", on the other "16way".

Best, Andre

npch commented 11 years ago

Given that there is a bigger issue at hand here: "how do you keep things simple when an admin can define any set of arbitrary strings for parallel environments" what's the patch fix for this in the meantime that could be implemented for now in a fork of the Bliss codebase?