riga / law

Build large-scale task workflows: luigi + job submission + remote targets + environment sandboxing using Docker/Singularity
http://law.readthedocs.io
BSD 3-Clause "New" or "Revised" License
98 stars 41 forks source link

Unicode error with Condor and @log decorator #65

Closed ast0815 closed 5 years ago

ast0815 commented 5 years ago

Hello,

I am again having problems with the @log decorator and unicode. At least that is what the error message tells me. The weird thing is, that this time the error only pops up when the task is run as a condor job. Maybe stdout and stderr are weird in the Condor environment?

This is enough to cause the error:

import law
from law.decorator import @log
from six import print_

class Bug(law.Task):
    default_log_file = '/path/to/log/file'
    @log
    def run(self):
        print_("ABC")
        print_("XYZ")
    def output(self):
        return []
    def requires(self):
        return []

When I run this task in a regular shell environment it works without flaws. If I run it in a condor job (i.e. the condor job runs a shell script in which law run Bug is executed) I get an exception:

Traceback (most recent call last):
  File "/opt/ppd/scratch/kdf77245/t2k_software/gasInteractions/gas_interaction_analysis/law/ENV27/lib/python2.7/site-packages/luigi/worker.py", line 199, in run
    new_deps = self._run_get_new_deps()
  File "/opt/ppd/scratch/kdf77245/t2k_software/gasInteractions/gas_interaction_analysis/law/ENV27/lib/python2.7/site-packages/luigi/worker.py", line 139, in _run_get_new_deps
    task_gen = self.task.run()
  File "/opt/ppd/scratch/kdf77245/t2k_software/gasInteractions/gas_interaction_analysis/law/ENV27/lib/python2.7/site-packages/law/decorator.py", line 92, in wrapper
    return decorator(fn, _opts, *args, **kwargs)
  File "/opt/ppd/scratch/kdf77245/t2k_software/gasInteractions/gas_interaction_analysis/law/ENV27/lib/python2.7/site-packages/law/decorator.py", line 126, in log
    traceback.print_exc(file=tee)
  File "/usr/lib64/python2.7/traceback.py", line 232, in print_exc
    print_exception(etype, value, tb, limit, file)
  File "/usr/lib64/python2.7/traceback.py", line 124, in print_exception
    _print(file, 'Traceback (most recent call last):')
  File "/usr/lib64/python2.7/traceback.py", line 13, in _print
    file.write(str+terminator)
  File "/opt/ppd/scratch/kdf77245/t2k_software/gasInteractions/gas_interaction_analysis/law/ENV27/lib/python2.7/site-packages/law/util.py", line 836, in write
    self._write(*args, **kwargs)
  File "/opt/ppd/scratch/kdf77245/t2k_software/gasInteractions/gas_interaction_analysis/law/ENV27/lib/python2.7/site-packages/law/util.py", line 899, in _write
    consumer.write(*args, **kwargs)
TypeError: must be unicode, not str
riga commented 5 years ago

Thanks for reporting that error!

Maybe stdout and stderr are weird in the Condor environment?

Sounds like... I will setup your example later today and execute it on two different condor clusters to reproduce and fix the bug.

ast0815 commented 5 years ago

Thank you! As an additional pointer, commenting out lines 121 and 122 in law/decorator.py seems to fix the exception, but of course there is no logging then:

            #sys.stdout = tee
            #sys.stderr = tee
riga commented 5 years ago

Hi @ast0815 ,

I struggle to reproduce the error you observe. Probably there is some problem in the configuration of your cluster's condor setup?

Do you forward all environment variables to your remote jobs, à la https://github.com/riga/law/blob/master/examples/htcondor_at_cern/analysis/framework.py#L72 ?

I created a minimal example and ran it successfully on 3 different clusters (python 2.7, 3.5, 3.6). You can grab it here and run it via

source setup.sh
law db
law run CondorBug --local-scheduler --poll-interval 0.2 --retries 0 --transfer-logs

Does that work for you?

ast0815 commented 5 years ago

As I said, I am not currently using the CondorWorkflow functionality. I am simply sending out condor jobs (using a condor.job file and condor_submit) that run law run Bug. Could you give that a try? I will also try yo implement the proper CondorWorkflow on my end.

riga commented 5 years ago

Works for me as well.

Ok, could you post your condor.job file? That would make debugging way easier.

ast0815 commented 5 years ago

Sure. Here is my condor.job:

executable = condor_script.sh
universe = vanilla
requirements = OpSysAndVer == "CentOS7"
error = condor.err
output = condor.out
#error = /dev/null
#output = /dev/null
log = condor.log

transfer_input_files = condor_script.sh,law.cfg
transfer_output_files =

queue 1

And here is my condor_script.sh:

#!/bin/bash
. /opt/ppd/scratch/kdf77245/t2k_software/gasInteractions/gas_interaction_analysis/law/cluster_setup.sh
law db
law run Bug
ast0815 commented 5 years ago

Also, is there maybe some more documentation about the CondorWorkflow somewhere? Aside from just the examples? I cannot get it to work at all right now. Even when not doing any logging.

riga commented 5 years ago

My job file looks identical, except for a getenv = true. Is there another python 2.7.x version on the worker node?

Also, is there maybe some more documentation about the CondorWorkflow somewhere?

Right now, only for generic workflows: https://law.readthedocs.io/en/latest/workflows.html

Docs on the remote workflows will come soon, but e.g. the htcondor_at_vispa/cern examples basically explain the most important parts.

ast0815 commented 5 years ago

Ok, adding getenv = true to my condor.job seems to fix the problem, but I still do not know what actually caused it. I have looked at the output of env in the Condor environment previously and it seemed fine. I guess I will try to figure out the reason for the bug a bit more and then just resign to using this workaround.