radical-collaboration / hpc-workflows

NSF16514 EarthCube Project - Award Number:1639694
5 stars 0 forks source link

Using int in task arguments leads to error #131

Closed Weiming-Hu closed 3 years ago

Weiming-Hu commented 3 years ago

I have encountered the following problem. I'm creating a task like this:

t = Task()
t.name = task_name
t.executable = '/glade/u/home/wuh20/github/AnalogsEnsemble/build/apps/anen_netcdf/anen_netcdf'

t.pre_exec = [
    "module load gnu cmake eccodes git python/3.7.5",
    "export LD_LIBRARY_PATH=/glade/u/apps/ch/os/usr/lib64/:$LD_LIBRARY_PATH",
]

t.cpu_reqs = {
    'processes': 1,
    'process_type': 'MPI',
    'threads_per_process': 36,
    'thread_type': 'OpenMP',
}

t.arguments = [
    '-c', '/glade/u/home/wuh20/github/pv-workflow/02_WeightOptimization/GenerateAnEn.cfg',
    '--out', '{}.nc'.format(task_name), '--analogs', 21,
    '--forecast-file', fcst_file, '--observation-file', obs_file,
    '--weights',
]

t.arguments.extend(weights)

Please note that weights is a list with integers, e.g. [1, 1, 1, 2].

And when I try to run my workflow, I got the following error:

(venv_Predictability) wuh20@cheyenne2:~/github/pv-workflow/02_WeightOptimization> python 02_SearchWeights.py 
EnTK session: re.session.cheyenne2.ib0.cheyenne.ucar.edu.wuh20.018662.0009
Creating AppManagerSetting up RabbitMQ system                                 ok
                                                                              ok
Validating and assigning resource manager                                     ok
Creating weight combinations ...
8001 weight combinations have been created.
There are 2 pipelines to simulate.
Setting up RabbitMQ system                                                   n/a
new session: [re.session.cheyenne2.ib0.cheyenne.ucar.edu.wuh20.018662.0009]    \
database   : [mongodb://hpcw-psu:****@129.114.17.185:27017/hpcw-psu]          ok
create pilot manager                                                          ok
submit 1 pilot(s)
        pilot.0000   ncar.cheyenne_mpt       144 cores       0 gpus           ok
All components created
create unit managerUpdate: pipeline.0000 state: SCHEDULING
Update: pipeline.0000.Analogs state: SCHEDULING
Update: pipeline.0000.Analogs.NN_Weights_0_0_0_0_0_1_9 state: SCHEDULING
Update: pipeline.0000.Analogs.RB_Weights_0_0_0_0_0_1_9 state: SCHEDULING
Update: pipeline.0001 state: SCHEDULING
Update: pipeline.0001.Analogs state: SCHEDULING
Update: pipeline.0001.Analogs.RB_Weights_0_0_0_0_0_2_8 state: SCHEDULING
Update: pipeline.0001.Analogs.NN_Weights_0_0_0_0_0_2_8 state: SCHEDULING
Exception in thread enqueue-thread:
Traceback (most recent call last):
  File "/glade/u/home/wuh20/venv_Predictability/lib/python3.7/site-packages/radical/entk/appman/wfprocessor.py", line 255, in _enqueue
    self._execute_workload(workload, scheduled_stages)
  File "/glade/u/home/wuh20/venv_Predictability/lib/python3.7/site-packages/radical/entk/appman/wfprocessor.py", line 198, in _execute_workload
    wl_json = json.dumps([task.to_dict() for task in workload])
  File "/glade/u/apps/ch/opt/python/3.7.9/gnu/9.1.0/lib/python3.7/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "/glade/u/apps/ch/opt/python/3.7.9/gnu/9.1.0/lib/python3.7/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/glade/u/apps/ch/opt/python/3.7.9/gnu/9.1.0/lib/python3.7/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/glade/u/apps/ch/opt/python/3.7.9/gnu/9.1.0/lib/python3.7/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type int64 is not JSON serializable

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/glade/u/apps/ch/opt/python/3.7.9/gnu/9.1.0/lib/python3.7/threading.py", line 926, in _bootstrap_inner
    self.run()
  File "/glade/u/apps/ch/opt/python/3.7.9/gnu/9.1.0/lib/python3.7/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/glade/u/home/wuh20/venv_Predictability/lib/python3.7/site-packages/radical/entk/appman/wfprocessor.py", line 270, in _enqueue
    raise EnTKError(ex) from ex
radical.entk.exceptions.EnTKError: Object of type int64 is not JSON serializable

/glade/u/home/wuh20/venv_Predictability/lib/python3.7/site-packages/pymongo/topology.py:162: UserWarning: MongoClient opened before fork. Create MongoClient only after forking. See PyMongo's documentation for details: https://pymongo.readthedocs.io/en/stable/faq.html#is-pymongo-fork-safe
  "MongoClient opened before fork. Create MongoClient only "

A workaround is to convert weights to strings, e.g. weights = [str(w) for w in weights].

Maybe there is a deeper cause for this? Thanks!

Weiming-Hu commented 3 years ago

My stack info:

(venv_Predictability) wuh20@cheyenne2:~/github/pv-workflow/02_WeightOptimization> radical-stack 

  python               : /glade/u/home/wuh20/venv_Predictability/bin/python3
  pythonpath           : 
  version              : 3.7.9
  virtualenv           : /glade/u/home/wuh20/venv_Predictability

  radical.analytics    : 1.5.0
  radical.entk         : 1.5.8
  radical.gtod         : 1.5.0
  radical.pilot        : 1.5.12
  radical.saga         : 1.5.9
  radical.utils        : 1.5.9
andre-merzky commented 3 years ago

Hmm, we should catch this error earlier - but the observation itself is correct: the arguments are passed on a command line, and the only type a command line knows is strings. We might in the future relax this and convert to strings on our own, but at the moment we don't.