radical-collaboration / hpc-workflows

NSF16514 EarthCube Project - Award Number:1639694
5 stars 0 forks source link

Modifying shared_data leads to unexpected behaviors #119

Closed Weiming-Hu closed 4 years ago

Weiming-Hu commented 4 years ago

Hi team, here is yet another confusion that I'm having. It might be a very simple one but I couldn't find much information on using the shared_data attribute of an app manager, although I'm aware of a quick tutorial here.

The following code will walk you through reproducing the issue in an interactive python session.

(venv) geogadmins-Air:year_3 wuh20$ python
Python 3.7.5 (default, Nov  1 2019, 02:16:32) 
[Clang 11.0.0 (clang-1100.0.33.8)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from radical.entk import AppManager
>>> app = AppManager(hostname = "two.radical-project.org", port = "33239")
EnTK session: re.session.geogadmins-Air.wuh20.018376.0001
Creating AppManagerSetting up RabbitMQ system                                 ok
                                                                              ok
>>> print(app.shared_data) # This is expected because it should start being empty
[]
>>> app.shared_data = ["my_shared_configuration.cfg"] # Include a single file manually
>>> print(app.shared_data) # Strange! Why is it not included? But actually this works when I submit my job as it is.
[]
>>> app.shared_data.extend(["config1.cfg", "config2.cfg"]) # If I want to include more files in this way, these files are not actually included as shared data when I try to submit the job as it is.
>>> print(app.shared_data) # Although now it is printing but the first file is missing.
['config1.cfg', 'config2.cfg']
>>> app.shared_data = ["my_shared_configuration.cfg"] # And it seems I can't change it.
>>> print(app.shared_data)
['config1.cfg', 'config2.cfg']

Because I have two sets of files to include, one is the single shared file that is the same for all tasks and the other is a set of files that are specific to each task, I ended up doing this in my code.

While I don't think there is anything wrong with this, I just found this to be a little bit unexpected. I hope you can help me clear some confusion.

Much appreciated. Thank you

andre-merzky commented 4 years ago

This behavior is an unfortunate side effect of how the RE API is implemented: the attribute setters are actually hooked into function calls which set internal state which does not always (as in this case) represent the actual attribute types being set. We agree that this is confusing and has unexpected side effects.

Changing the api has significant intertia for us, so there is no quick solution forthcoming. We will (a) better document this behavior, and (b) take this in account when redesigning the API for later major release cycles.

Thanks for letting us know about the confusion, much appreciated!

mturilli commented 4 years ago

As discussed with @lee212, we will open two tickets in EnTK repo to improve data staging documentation and starting a RFC for an API iteration.

lee212 commented 4 years ago

https://github.com/radical-cybertools/radical.entk/issues/442

Weiming-Hu commented 4 years ago

Thank you very much. This is working in the devel branch of radical.entk.