Closed bpuchala closed 6 years ago
I really like the addition of default.py, this addresses a lot of issues I've had in the past (woooo, dependency injection!).
I'm addressing several (minor) concerns below. Here's a super-short summary:
While 99% of pbs usage will be in conjunction with the CASM codebase, since pbs isn't actually part of CASM, we should not call things CASM-etc (e.g., the environment variables). One of two approaches here should be taken:
In both cases, a rename is required: pbs seems to be a holdover from when we only supported pbs.
Suggested names in both cases:
Can we switch to a config file (probably in ~/.pbs) instead of a (growing) list of environment variables? We're already touching files on the user's system during exec, and there's no reason we can't add a config.json to ~/.pbs. I've yet to encounter a build/deployment system dependent on multiple environment variables that hasn't given me issues.
Specifically, there would be exactly one environment variable: PBS_HOME, which if set would override $HOME/.pbs as the location of all the PBS-related things.
Some of the query operations get very slow for large databases. Scraping and removing "incomplete" jobs from abandoned or modified projects is really hard, and at least on a series of connected clusters where I'm running several hundred pbs jobs a day, the database quickly grows so large as to take several seconds to query. This is made worse by the fact that individual CASM projects all point to one database (I'll raise a separate issue about my suggestion for that).
I suggest we add an index of jobid, which is the most-commonly-queried thing. An overview on indices is here: https://www.tutorialspoint.com/sqlite/sqlite_indexes.htm
tl;dr, indices slow down insertion into the db, but speed up retrieval (e.g. for pstat's updates). Inserting new jobs, at least during a CASM project, is done as part of a larger loop/task and so the additional overhead is small compared to... everything else. But querying (esp. to return as 'job is still running') is really slow for large databases and is often the 'slowest' step IMO.
@tallakahath Thanks so much for the feedback.
On naming: If doing casm-something, what did you mean by "explicitly reference CASM at every step"? Which steps are you referring to? I've been learning towards casm-something, making it a sub-package likecasm.jobs
, but don't feel very strongly one way or the other.
On config file: +1
On operation speed: What specific commands/functions are you concerned about being slow? The parts that have been slow for me I've always believed to be due to qstat
calls to update job status, not SQLite operations. For my usage, it's sometimes been kind of annoying, but never been bad enough for me to spend much time on optimization, so it'd be useful to have specifics.
For naming: if we went with environment variables, all of them would always be CASM_foo, maybe relabel some of the class names, and potentially build in either CASM-dependence if it makes life easier, or at least, drop efforts to be ultra-flexible in application. Probably just a naming thing when all is said and done.
That said, when you say "sub-package" do you mean adding it to the CASM repo explicitly? I'm heavily against that, having PBS/casm.jobs/whatever as its own "thing" makes deployment significantly easier. Really, installing any of the python parts of CASM always seems to be more difficult than necessary. I do like the approach you've taken here with adding to PyPI though (and this would be made harder if casm.jobs was rolled into the CASM repo...)
On ops speed: I'll go run some tests on the sandia clusters (where I have enough jobs that this is a significant problem) and get back to you.
On Mon, Jun 19, 2017 at 8:30 PM Brian Puchala notifications@github.com wrote:
@tallakahath https://github.com/tallakahath Thanks so much for the feedback.
On naming: If doing casm-something, what did you mean by "explicitly reference CASM at every step"? Which steps are you referring to? I've been learning towards casm-something, making it a sub-package likecasm.jobs, but don't feel very strongly one way or the other.
On config file: +1
On operation speed: What specific commands/functions are you concerned about being slow? The parts that have been slow for me I've always believed to be due to qstat calls to update job status, not SQLite operations. For my usage, it's sometimes been kind of annoying, but never been bad enough for me to spend much time on optimization, so it'd be useful to have specifics.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/prisms-center/pbs/pull/9#issuecomment-309635040, or mute the thread https://github.com/notifications/unsubscribe-auth/AGbfpOIaPE7KkGRWBdD_JIzbywn_OcUYks5sFzzcgaJpZM4N8oKk .
Not add to CASM repo. I was thinking namespace packages, though I haven't tried doing this before.
Lets keep things simple for now...
OK, finally moving on this. The 3.X branch has been updated to change the name to prisms-jobs
. To make it clear, I will be pushing this branch to a new repository, prisms_jobs
. For now, I'm not adding a config file, just documenting the environment variables. There are only 3 now, and most users will not find them necessary anyways.
I've generated sphinx documentation which will be used to create a gh-pages website, and I'll shortly be creating prisms-jobs
distributions for pip and conda.
Changes:
pbs.interface.torque
andpbs.inteface.slurm
. The appropriate interface is chosen and set inpbs.software
. The choice can be customized or a completely custom interface imported via theCASM_PBS_SOFTWARE
environment variable. See README for details.pstat
,psub
, andtaskmaster
pip
& PyPIpip install -i https://testpypi.python.org/pypi casm-pbs
CASM_PBS_JOB_UPDATE
. See README for details.