rbeyer / kalasiris

A Python library to wrap functions and functionality for the Integrated Software for Imagers and Spectrometers (ISIS).
https://kalasiris.readthedocs.io
BSD 3-Clause "New" or "Revised" License
9 stars 5 forks source link

Make parsing stdout more robust to user IsisPreferences #11

Closed Kelvinrr closed 1 year ago

Kelvinrr commented 1 year ago

Is your feature request related to a problem? Please describe.

When writing scripts it can be pretty annoying when some preferences cause additional stdout like the progress bar. This makes adds complexity to parsing stdout as PVL since you need to filter those lines.

Describe the solution you'd like

IDK what a good solution would be. In my own scripts I add an extra filter step to remove the progress bar from stdout but I feel that isn't very robust.

Another possible solution is using the installed IsisPreferences which are typically left alone. Kalasiris can load those preferences by default, or at least have the option to set the preference file somewhere in the library separate from the users. Isis has the ability to specify the preference file via the -pref flag that iirc is universal for it's apps.

Describe alternatives you've considered

ISIS can have a universal flag that only outputs PVL (or guarantees some kinda of machine parsable format) regardless of user preferences that things like kalasiris and script writers in general can use.

Additional context

rbeyer commented 1 year ago

Can you provide a simple, specific example of ISIS putting this extra info (like progress bar output) into STDOUT? I can imagine what you're talking about, but I've never run into this.

One of the possible solutions you suggest is using the -pref flag to ISIS itself to designate a preferences file that would "correct" the STDOUT output to something that would be better for you. The kalasiris library supports this. If you would normally add -pref path/to/file.txt to your ISIS command, you can add pref__="path/to/file.txt" (that's "pref" with two trailing underbars) to any kalasiris call, and that will be passed along to the ISIS call.

I appreciate that's a per-call solution, and not an easy-for-user "global" option, but in that sense, it is mirroring ISIS itself. If you change something in the global default ISIS prefs file, then sure, when kalasiris calls ISIS, that'll always get picked up via ISIS itself.

IDK what the right solution is, either. In the way that kalasiris is meant to be a thin wrapper for ISIS command-line calls, I think kalasiris is offering comparable functionality (global ISIS preferences are managed by ISIS itself, and per-command -pref is supported).

If there is a straightforward way to include a filter (especially if you've already written it), to post-process the STDOUT to remove the funky character strings, I think that could be a good addition to kalasiris. You also mention changes to ISIS itself to provide cleaner output for programmatic users, which may be the better long-term solution.

Kelvinrr commented 1 year ago

So running:

try:
    ret = kisis.segment(params["match"], nl=30000, overlap=0)
    log.debug(f"returned: {ret}")
except subprocess.CalledProcessError as err:
    log.debug('Had an ISIS error:')
    log.debug(' '.join(err.cmd))
    log.debug(err.stdout)
    log.debug(err.stderr)

print(ret)

when progress=On in the preferences produces

>>> print(ret.stdout)
segment: Working
0% Processed
10% Processed
20% Processed
30% Processed
40% Processed
50% Processed
60% Processed
70% Processed
80% Processed
90% Processed
100% Processed
Group = Results
  InputLines      = 52224
  InputSamples    = 2532
  StartingLine    = 1
  StartingSample  = 1
  EndingLine      = 30000
  EndingSample    = 2532
  LineIncrement   = 1
  SampleIncrement = 1
  OutputLines     = 30000
  OutputSamples   = 2532
End_Group
segment: Working
0% Processed
10% Processed
20% Processed
30% Processed
40% Processed
50% Processed
60% Processed
70% Processed
80% Processed
90% Processed
100% Processed

Group = Results
  InputLines      = 52224
  InputSamples    = 2532
  StartingLine    = 30001
  StartingSample  = 1
  EndingLine      = 52224
  EndingSample    = 2532
  LineIncrement   = 1
  SampleIncrement = 1
  OutputLines     = 22224
  OutputSamples   = 2532
End_Group

Which naturally when ret.stdout gets directly passed into pvl.loads you get parsing errors:

Traceback (most recent call last):
 File "/usgs/cpkgs/anaconda3_linux/envs/isis7.1.0-FF/lib/python3.9/site-packages/pvl/lexer.py", line 429, in lexer
   t = yield t
ValueError: Expecting an Aggregation Block, an Assignment Statement, or an End Statement, but found "Working"  

I didn't know about pref__=, this actually solves this in the short term. Definitely cleaner than trying to pre-parse string output. I'm not 100% familiar with ISIS preferences, and I'm concerned there would be more things in there that could be an issue when you try to parse the output of from a kalasiris call. For example, it's possible for a user to output logging to the terminal. This causes logging to end up in your stdout rather than in a file which can mess up parsing logic. Because of the variability, I'm worried of it being kinda of a mess to filter in a universal way.

Maybe if there was some syntax sugar to add partials to the ISIS calls? Similar to https://docs.python.org/3/library/functools.html#functools.partial

For example something like:

import kalasiris 

kalasiris.setPreferencePath("some/file") # now all isis calls have this injected through the pref__ flag. 

I don't know how the ISIS calls are generated, so this may not be as trivial as I might think.

rbeyer commented 1 year ago

Thanks, yeah, that does look like a mess to filter, and as you say, there may be other strange things lurking in STDOUT, so the promise of a robust filter might be over-selling.

Yes, so pref__="path/to/file" should work (if it doesn't, let me know).

How important is it to have the global preferences thing you're asking for?

I get what you're saying with functools, but that's not quite how we'd implement it in the library. You could certainly take advantage of functools.partial() in your user code to build a set of functions that have the pref__ parameter hardwired (or frozen as the documentation says), but then you'd have to do all of that work yourself.

In the kalasiris code, when you import kalasiris is when we build the set of functions (dynamically) from whatever ISIS executable files there are. I don't think it would be too hard to build functionality into those "function templates" that looks to some global preference path variable to see if it is set, and if so, automatically provide it to every call (unless the user also explicitly sets pref__ in an individual call, seems like that should over-ride, and maybe set to None turns off the supplied "default" if set).

So the cost is that it will add one more "if-gate" every time a wrapped ISIS call happens, which probably isn't going to significantly impact performance. I don't think it would be too difficult to implement, so how critical or important to your work is this? If this is a show-stopper, I can find some time sooner-rather-than-later to work it, otherwise it goes on the queue.

Kelvinrr commented 1 year ago

It's definitely not a show stopper. Just a nice to have. Doing the work myself in my scripts is a workaround for now. If I get the time I'll see how I could implement this inside kalasiris and make the contribution as I think I would be overwriting the default IsisPreferences every time, personally.

Kelvinrr commented 1 year ago

Alright, unfortunately trying to override the preferences doesn't work. I think ISIS is making the user preferences (~/.Isis/IsisPreferences) override any other preferences including the one you pass in as I imagine the user pref file is loaded last. So for now I am still using my filter code. Basically a brain dead one liner:

filter_progress = lambda x: "\n".join([line for line in x.split("\n") if not "% Processed" in line and not "Working" in line])

Messing with different ways the preferences can mess up output, the stdout user log can completely swallow the expected output (e.g. Segment doesn't output segment metadata if logs to stdout is turned on), so I think this is something that would need to be fixed by Isis. And simply having a filter isn't gonna fix it long term. I'll make an issue to prioritize the -pref flag over everything else.

rbeyer commented 1 year ago

Hunh, so this document: https://isis.astrogeology.usgs.gov/7.0.0/documents/EnvironmentAndPreferencesSetup/EnvironmentAndPreferencesSetup.html

Indicates that the "Project Preferences File" specified by -pref should override all of the other preference files, however, there is a weird "[Future Option]" label in red there. Does that mean it isn't implemented? I'm sure I've used -pref successfully in the past (although I don't usually keep a *$HOME/.Isis$ file, so maybe that's it?).

Kelvinrr commented 1 year ago

the flag seems to be working in the sense that you can pass in a preference file, but it seems the idea that -pref is prioritized is not true. Seems the user's home preference is prioritized.

Below, I pass in the default preference file, but it still outputs progress and the TerminalOutput=On flag is still swallowing the PVL output. Both are only on in my home preference file. So I think this is a bug in ISIS.

(isisdev23) IGSWZAWGLT00017:PILOT krodriguez$ segment from=/Users/krodriguez/lronac_ff2/Lev1/M1316912174RE_cal_echo.cub nl=30000 overlap=0 -preference=/Users/krodriguez/anaconda3/envs/isisdev23/IsisPreferences
segment: Working
100% Processed
segment: Working
100% Processed