ratt-ru / Stimela-classic

Containerized radio interferometry scripting framework -- NB: Classic version is no longer in active development, use stimela 2! See README for details.
GNU General Public License v2.0
28 stars 16 forks source link

problem with empty strings as cab parameters #627

Open o-smirnov opened 4 years ago

o-smirnov commented 4 years ago

Following up on https://github.com/caracal-pipeline/caracal/issues/1137, I realized this may be a more generic problem:

for param in cab['parameters']:
    name = param['name']
    value = param['value']

    if name == 'ms':
        mslist = ' '.join(value)
        continue
    if value in [None, False]:
        continue
    if value is True:
        value = ""
    args += ['{0}{1} {2}'.format(cab['prefix'], name, value)]

_runc = " ".join(["mask_ms.py"] + args + [mslist])

I understand the intent of the code is to skip None/False arguments entirely, turn True arguments into --option switches, and turn everything else into --option value pairs, correct?

The problem is what happens when a value of "" is passed. This becomes --option, which shlex.split() turns into just --option, causing the underlying tool to crash because it expects to find a value for the option, but picks up the next option from the command line instead.

The whole build list/join list/split again thing anyway seems unnecessary roundabout and error-prone (what if you have a value that's two words? Split will cheerfully make two arguments out of that), but I guess this is an artefact of switching from xrun to subprocess_call.

It would be more robust to just accumulate a list of arguments directly and forget join/split -- that way, empty strings and two word strings would get correctly passed to the tool as single separate arguments.

Since I assume this pattern recurs in many cabs, what about moving it into a scabha function? At least all the monkeying around can be in one place then.

Even if this is fixed, we have a Caracal interface problem. Because these damn schemas don't allow None as default, the default for a missing argument is an empty string. See https://github.com/caracal-pipeline/caracal/issues/1137 again. What we'd really like to do is tell Stimela "ignore this argument, it is missing", for which Stimela (and the rest of the Python world) has the wonderful word None. But we can't have None because the schema won't allow a None default to a str argument...

svw26 commented 4 years ago

Instead of None, @paoloserra had previously suggested null for the default of a mosaic parameter. Maybe this could be used widely. x

o-smirnov commented 4 years ago

It's a bit of a toss-up. Whether we take "None" or "null", same question -- is there a legitimate use for this string value elsewhere?

svw26 commented 4 years ago

Oh... I thought that "None" was leading to some specific pipeline behaviour, whilst you could use "null" for slightly-different behaviour, working in parallel...

In my case, I interpreted "example" to actually mean 'example' rather than 'default', which is why the default for passing target images to the mosaic worker is currently "directory/first_image.fits, directory/second_image.fits" . I settled on this after running into problems with having a sequence of empty strings... I could instead get the worker to catch "null, null" if the user doesn't specify anything (or "None, None" if that's the way we'd like to go). x