3.X - Githubissues

bpuchala commented 7 years ago

Changes:

Re-organizes the job management dependent portions to be completely in pbs.interface.torque and pbs.inteface.slurm. The appropriate interface is chosen and set in pbs.software. The choice can be customized or a completely custom interface imported via the CASM_PBS_SOFTWARE environment variable. See README for details.
Fixes things that had broken in pstat, psub, and taskmaster
Improves documentation
Install via pip & PyPI
- Currently available for testing via: pip install -i https://testpypi.python.org/pypi casm-pbs
Set JobDB.update() method to include / not include matching by hostname via the CASM_PBS_JOB_UPDATE. See README for details.

tallakahath commented 7 years ago

I really like the addition of default.py, this addresses a lot of issues I've had in the past (woooo, dependency injection!).

I'm addressing several (minor) concerns below. Here's a super-short summary:

removing/modifying "CASM" references
config.json over environment variables
Use of an index in jobs.db

While 99% of pbs usage will be in conjunction with the CASM codebase, since pbs isn't actually part of CASM, we should not call things CASM-etc (e.g., the environment variables). One of two approaches here should be taken:

Remove all references to CASM and keep pbs under the PRISMS umbrella
Add pbs to the CASM umbrella (though as a separate project), and explicitly reference CASM at every step

In both cases, a rename is required: pbs seems to be a holdover from when we only supported pbs.

Suggested names in both cases:

(some-witty-bacronym that is related to light/the bending of light, or some specific part of the spectrum)
CASMrun, CASMjobs, CASMnet

Can we switch to a config file (probably in ~/.pbs) instead of a (growing) list of environment variables? We're already touching files on the user's system during exec, and there's no reason we can't add a config.json to ~/.pbs. I've yet to encounter a build/deployment system dependent on multiple environment variables that hasn't given me issues.

Specifically, there would be exactly one environment variable: PBS_HOME, which if set would override $HOME/.pbs as the location of all the PBS-related things.

Some of the query operations get very slow for large databases. Scraping and removing "incomplete" jobs from abandoned or modified projects is really hard, and at least on a series of connected clusters where I'm running several hundred pbs jobs a day, the database quickly grows so large as to take several seconds to query. This is made worse by the fact that individual CASM projects all point to one database (I'll raise a separate issue about my suggestion for that).

I suggest we add an index of jobid, which is the most-commonly-queried thing. An overview on indices is here: https://www.tutorialspoint.com/sqlite/sqlite_indexes.htm

tl;dr, indices slow down insertion into the db, but speed up retrieval (e.g. for pstat's updates). Inserting new jobs, at least during a CASM project, is done as part of a larger loop/task and so the additional overhead is small compared to... everything else. But querying (esp. to return as 'job is still running') is really slow for large databases and is often the 'slowest' step IMO.

bpuchala commented 7 years ago

@tallakahath Thanks so much for the feedback.

On naming: If doing casm-something, what did you mean by "explicitly reference CASM at every step"? Which steps are you referring to? I've been learning towards casm-something, making it a sub-package likecasm.jobs, but don't feel very strongly one way or the other.

On config file: +1

On operation speed: What specific commands/functions are you concerned about being slow? The parts that have been slow for me I've always believed to be due to qstat calls to update job status, not SQLite operations. For my usage, it's sometimes been kind of annoying, but never been bad enough for me to spend much time on optimization, so it'd be useful to have specifics.

tallakahath commented 7 years ago

For naming: if we went with environment variables, all of them would always be CASM_foo, maybe relabel some of the class names, and potentially build in either CASM-dependence if it makes life easier, or at least, drop efforts to be ultra-flexible in application. Probably just a naming thing when all is said and done.

That said, when you say "sub-package" do you mean adding it to the CASM repo explicitly? I'm heavily against that, having PBS/casm.jobs/whatever as its own "thing" makes deployment significantly easier. Really, installing any of the python parts of CASM always seems to be more difficult than necessary. I do like the approach you've taken here with adding to PyPI though (and this would be made harder if casm.jobs was rolled into the CASM repo...)

On ops speed: I'll go run some tests on the sandia clusters (where I have enough jobs that this is a significant problem) and get back to you.

On Mon, Jun 19, 2017 at 8:30 PM Brian Puchala notifications@github.com wrote:

@tallakahath https://github.com/tallakahath Thanks so much for the feedback.

On naming: If doing casm-something, what did you mean by "explicitly reference CASM at every step"? Which steps are you referring to? I've been learning towards casm-something, making it a sub-package likecasm.jobs, but don't feel very strongly one way or the other.

On config file: +1

On operation speed: What specific commands/functions are you concerned about being slow? The parts that have been slow for me I've always believed to be due to qstat calls to update job status, not SQLite operations. For my usage, it's sometimes been kind of annoying, but never been bad enough for me to spend much time on optimization, so it'd be useful to have specifics.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/prisms-center/pbs/pull/9#issuecomment-309635040, or mute the thread https://github.com/notifications/unsubscribe-auth/AGbfpOIaPE7KkGRWBdD_JIzbywn_OcUYks5sFzzcgaJpZM4N8oKk .

bpuchala commented 7 years ago

Not add to CASM repo. I was thinking namespace packages, though I haven't tried doing this before.

tallakahath commented 7 years ago

Lets keep things simple for now...

bpuchala commented 7 years ago

OK, finally moving on this. The 3.X branch has been updated to change the name to prisms-jobs. To make it clear, I will be pushing this branch to a new repository, prisms_jobs. For now, I'm not adding a config file, just documenting the environment variables. There are only 3 now, and most users will not find them necessary anyways.

I've generated sphinx documentation which will be used to create a gh-pages website, and I'll shortly be creating prisms-jobs distributions for pip and conda.

prisms-center / pbs

3.X #9