htcondor / htmap

High-Throughput Computing in Python, powered by HTCondor
https://htmap.readthedocs.io
Apache License 2.0
32 stars 10 forks source link

String parameters being stripped of quotes #206

Closed stsievert closed 4 years ago

stsievert commented 4 years ago

Describe the bug

I am using these options to launch jobs:

import htmap

def job(**kwargs):  # synthetic
    return {"time": time(), **kwargs}

def main():
    options = {
        "request_cpus": "1",
        "request_gpus": "1",
        "request_memory": "2G",
        "request_disk": "1GB",
        "Requirements": "Target.CUDADriverVersion>=10.1",
        "custom_options": {
            "+wantFlocking": "true",
            "+WantGPULab": "true",
            "+GPUJobLength": "short",
        },
    }
    kwargs = {"foo": "bar"}
    f = htmap.map(job, kwargs, map_options=htmap.MapOptions(**options), tag="adadamp3")

I am running this on CHTC's pool at UW–Madison. When I run this job, I get this error:

MapComponentHeld: Component 0 of map adadamp3 is held: [21] Error from slot1_1@gpu2000.chtc.wisc.edu: Job failed to complete in 0 hrs

This bug report codifies some external discussion in office hours and via email

To Reproduce (to be added if desired)

Expected behavior I do not expect that error.

Software Versions:

(base) [stsievert@submit2 exp-cifar10]$ condor_version
$CondorVersion: 8.9.6 Mar 18 2020 BuildID: 498485 PackageID: 8.9.6-0.498485 PRE-RELEASE-UWCS $
$CondorPlatform: x86_64_RedHat7 $
(base) [stsievert@submit2 exp-cifar10]$ cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

(base) [stsievert@submit2 exp-cifar10]$ cat /proc/version
Linux version 3.10.0-1062.1.2.el7.x86_64 (mockbuild@kbuilder.bsys.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-39) (GCC) ) #1 SMP Mon Sep 30 14:19:46 UTC 2019
(base) [stsievert@submit2 exp-cifar10]$ python -c "import htcondor, htmap; print(htcondor.version()); print(htmap.version())"
$CondorVersion: 8.9.5 Dec 30 2019 BuildID: UW_Python_Wheel_Build $
HTMap version 0.5.1
(base) [stsievert@submit2 exp-cifar10]$

Screenshots If applicable, add screenshots to help explain your problem.

Additional context Add any other context about the problem here.

stsievert commented 4 years ago

I got an email back from Christina:

the GPUJobLength needs to be a string, and the quotes were being stripped from your job submission. You need to use this instead:

"GPUJobLength" : '"short"'

JoshKarpel commented 4 years ago

Unfortunately, I think this is more of a "gotcha" than a bug. At the end of the day, we can't know whether you intended the value to be a string or not, because there are two "stringish" values going into submit: tokens or references to other values like true, and strings like "short".

In a normal submit file, if you would write

+WantGPULab = true
+GPUJobLength = "short"

then for HTMap you need

htmap.MapOptions(custom_options = {"WantGPULab": "true", "GPUJobLength": '"short"'})

The general rule is something like

the value will be sent to HTCondor as if the contents of the string were in a submit file

... which is unpleasant, but I haven't thought of anything cleverer to do here.

At the minimum, we need a note in the MapOptions docs that describes the gotcha.

stsievert commented 4 years ago

references to other values like true

Ah. It looks like one can also specify transfer_input_files = foo.dat. Can transfer_input_files = "foo.dat" be used?

Could MapOptions take Python types then encode them so condor_submit will accept them? But that will leaves the difficulty of determining a string or a token. Is there a clean separation between keys that accept strings or tokens?

bbockelm commented 4 years ago

@JoshKarpel - in the submit file language, attributes not starting with + are macros to be expanded while those with + obey ClassAd rules.

Why not do the standard type conversion for + attributes and macros otherwise? I'm thinking of something like this:

>>> import classad
>>> ad = classad.ClassAd()                                                                                                                                                                                                           
>>> map = {"foo": "1", "+bar": classad.ExprTree('strcat(baz, "123")'), "+baz": "yup"}
>>> 
>>> for key, val in map.items():
...   if key.startswith('+'):
...     ad[key[1:]] = val
...     map[key] = str(ad.lookup(key[1:]))
... 
>>> map
{'foo': '1', '+baz': '"yup"', '+bar': 'strcat(baz,"123")'}

This makes the expected things easy (strings become strings) but keeps hard things hard (arbitrary expressions require you to dig out the underlying library).

JoshKarpel commented 4 years ago

I suspect that the internal implementation of this is largely a historical artifact (I don't recall thinking about it much at the time). There's certainly a place in the code to hang that kind of conversion.

However, I don't think that round-tripping through an ad will work as desired in all cases. The one that comes to mind is when you want to use a submit macro in a custom attribute. Example:

>>> import classad
>>> map = {"baz": "$(ClusterId)"}
>>> ad = classad.ClassAd()
>>> ad["baz"] = map["baz"]
>>> ad
[ baz = "$(ClusterId)" ]
>>> str(ad.lookup("baz"))
'"$(ClusterId)"'

which would cause the resulting value in the job ad to be a string instead of an integer.

From a usability perspective, an annoying-but-straightforward rule like the one in my comment above should also make it clear how to translate instructions for what to put it in a submit file into what to put in your MapOptions (in this case, because the rule wasn't stated anywhere, @stsievert ended up putting the wrong thing in). It's annoying, but means that you get what you put in, which is important since the user can't see the submit description itself (... which we should probably expose that in the logs for debugging purposes).

stsievert commented 4 years ago

the submit description itself (... which we should probably expose that in the logs for debugging purposes).

:+1:

an annoying-but-straightforward rule like the one in my comment above should also make it clear how to translate instructions for what to put it in a submit file into what to put in your MapOptions

I also think the solution with clear documentation is best.

JoshKarpel commented 4 years ago

Resolved by #196