spacepy / dbprocessing

Automated processing controller for heliophysics data
5 stars 4 forks source link

Support setting product permissions #33

Open jtniehof opened 3 years ago

jtniehof commented 3 years ago

dbprocessing doesn't currently do any permission management, so the permissions of created files are determined by the umask of the process/shell that calls ProcessQueue, and anything done by the actual file production code. It might be useful to specify permissions on a by-product basis. E.g. I often want different permissions for level 1 and level 2 files even on the processing machine.

Proposed enhancement

I'm not ready to suggest any single path forward at this point; it might be some combination of documentation and changes. See possibilities below, in Alternatives.

Alternatives

Several possibilities:

  1. Have the user set umask when calling ProcessQueue. This is what I'm doing, but it's of course it's not fine-grained enough for per-product permissions. Still, maybe this should be documented.
  2. Use a separate permissions-enforcing script. Also doing this, and getting sick of the massive lines of notification that permissions have been fixed, which is what brought me to this.
  3. Require the code to set its own permissions, which keeps dbp out of the loop. Consistent with the use of wrapper scripts and the like for weird codes.
  4. Add a "permissions" column to the product table. This starts to get more OS-specific (which is a problem with permission-handling in general.) Also note this means the permissions wouldn't be set until dbprocessing got to the file after it was produced (on creation it would have whatever permissions the code asked for, then it would get filled, then dbp would get its hands on it, so there would be a window of wrong permissions.)
  5. Add some level of "environment" support to codes and/or processes. This could include specifying a umask and maybe environment variables as well. Maybe even a working directory, although runMe is pretty oriented towards running in a temporary directory.

I rather like 5; it's probably the most work but has the potential to solve a lot of problems. We can also do some documentation to suggest 1-3.

OS, Python version, and dependency version information:

sys.version_info(major=2, minor=7, micro=12, releaselevel='final', serial=0)
sqlalchemy=1.0.11

Version of dbprocessing

git master (92b3f7ed2881168a1c7651869bf3b96a7e015273)

Closure condition

This issue should be closed when an option is chosen, implemented, and merged, or when appropriate related issues are opened that span the problem (e.g. opening an implementation issue after the design is complete).

balarsen commented 3 years ago

This would be a cool enhancement. I like something like (4) above. Likely not critical but interesting.

Pay attention that files move around a bit from temp space (to incoming?) to final home and permissions should be strong everywhere. Maybe temp space should be user only then opened when placed in correct home. Not sure how tempfile.TemporaryDirectory handles permission.

jtniehof commented 3 years ago

The permissions should be maintained on a move, but it's a good point that if we make sure the temp directory itself is user-readable only, then we're secure. On modern Linux systems the temp directory gets created in a per-user, user-only directory. mkdtemp states "The directory is readable, writable, and searchable only by the creating user ID" so that's good!

You prefer 4 to 5? I guess if we're explicitly doing permission rather than doing a umask it fits there a little better.

balarsen commented 3 years ago

You prefer 4 to 5? I guess if we're explicitly doing permission rather than doing a umask it fits there a little better.

I don't think I care much. I was thinking not too much work and 4 is easier.