simphony / simphony-metadata

[LEGACY] This repository contains the metadata definitions used in SimPhoNy project.
BSD 2-Clause "Simplified" License
0 stars 0 forks source link

Add egnines, visualisers, and pre/post processors to the metadata. #14

Closed mehdisadeghi closed 8 years ago

mehdisadeghi commented 8 years ago

This PR adds keywords to represent software tools which are used in various steps of simulation. Please feel free to comment.

kemattil commented 8 years ago

Adding particular engines or visualizers into the SimPhoNy metadata schema was exactly the thing I was hoping we would avoid. I simply think they do not belong there; they pollute the metadata schema ...

What happened to the idea data[CUBA.ENGINE] = "my_engine" or something similar?

mehdisadeghi commented 8 years ago

@kemattil This is exactly why I made this PR, to discuss this and finalize it. To finish some other parts this should be addressed. However, in any case, CUBA.ENGINE, CUBA.VISUALISER and probably CUBA.PRE/POST_POCESSOR are necessary keywords.

kemattil commented 8 years ago

I agree that the four keywords/data types you mentioned are probably necessary.

kitchoi commented 8 years ago

May I ask for an example/use case for how CUBA.ENGINE, CUBA.VISUALISER or CUBA.PRE/POST_PROCESSOR may be used? That would be helpful. Thanks.

mehdisadeghi commented 8 years ago

@kitchoi For example a user creates a CUDS and adds required engine metadata to it, e.g. Lammps with MD as an option. On the one hand, using this information, SimPhoNy can choose the correct wrapper for the simulation. On the other hand, when a user stores simulation results, it is necessary to know which engine version has been used to produce that result. This is of great importance, when there is an error in a paper and we want to go back (probably after a while) and check our results, i.e. which engine version we have used to produce that specific CUDS.

The other keywords are metadata about SimPhoNy wrappers. A wrapper, upon registration or being loaded by SimPhoNy, sends metadata about itself instead of some arbitrary keywords. I can't think of more use cases for them right now.

kitchoi commented 8 years ago

Thanks @mehdisadeghi Which of these metadata merely holds one value (a string for example)? Are they basic enough and should they belong to cuba.yml instead?

mehdisadeghi commented 8 years ago

You're welcome Kit! IMO, all these keys are data structures that would contain more than one single value, i.e. a set of features/requirements about the desired tool. I have only added CUBA.VERSION to cuba.yaml but even that one might end up having more information such as minor version, major version, etc.

kitchoi commented 8 years ago

I see that you want to attach a version to each of the modeling engine, pre-processing, post-processing tools, and I see that the goal is to make a simulation reproducible. Something like this? cuds[CUBA.ENGINE] = ("Lammps-md", "b5d0a2") cuds[CUBA.POST_PROCESSING] = ("mayavi", "d4e0f2") where the second field refers to a specific version, a commit for example.

mehdisadeghi commented 8 years ago

Exactly, we want to be able to reproduce the simulation. The code would be something similar:

c = CUDS(name="mycuds")
engine = Lammps(flavor=CUBA.MD, version="b5d0a2")
#or
engine = Engine(name="lammps", flavor=CUBA.MD, version="b5d0a2")
c.add(engine)

I think, if a user provides engine's version in the beginning, and SimPhoNy does not have that specific version, the simulation will fail. In any case, the version number will be added to the produced CUDS by SimPhoNy.

If we consider SimPhoNy as a platform which supports several different engines, even with different versions, we should turn the version number into a data structure which contains more detailed information.

P.S. I just realized I have to add MD, DEM, etc to the metadata.

kitchoi commented 8 years ago

Is the version referring to the version of the python wrapper (simlammps for example) or the version of the underlying model executable?

During my time as a scientist, what I often see is that scientists would modify the source code (C++/Fortran/anything) of a numerical model, compile it, and archive that executable and source for reproducibility. Then the run script would point to an executable so that there is absolute no way a different code would be used should the run script be run again in the future.

With that in mind, it is hard for me to understand how version should be passed to the python wrapper of Lammps if all the wrapper does is to look for the lammps executable under the user's PATH.

mehdisadeghi commented 8 years ago

scientists would modify the source code

So far, SimPhoNy comes with a pre-installed set of wrappers which support specific executable, thus, it disallows running arbitrary executable. If we decide to support user-modifed engines, we have to change things.

And we would need to store information about the environment that a CUDS has been produced in, including SimPhoNy version, wrapper versions and any executable version, probably in something called CUBA.ENVIRONMENT

mehdisadeghi commented 8 years ago

@kemattil As long as we can find a way to designate different engines explicitly, I am happy with leaving out specific engine CUBA entries in meta-data. But do all engines follow a similar metadata structure? What if some of them have more features which can not be described with one generic engine metadata? Can we group them into some categories? (pre/post processors, visualizer, engine)

More importantly, if we leave engine identification to wrapper developers, how we can make sure that a given CUDS, which has lammps as its engine name, will always be run with the same executable again?

kemattil commented 8 years ago

The SimPhoNy framework has two main kinds of clients: application users and wrapper developers (which of course can be the same persons, but not necessarily).

That is, the currently implemented wrappers/engines are just the first cases/examples: if the framework is successful, there will be many more (think of tens of or even hundreds of wrappers ...).

So far, SimPhoNy comes with a pre-installed set of wrappers which support specific executable, thus, it disallows running arbitrary executable.

No, we are not suppose to disallow running arbitrary executables.

Any volunteer should be able to develop and run his/her own wrapper for an arbitary executable - this is one of fundamental ideas of SimPhoNy. And the developer is responsible for implementing the wrapper so that it complies with the API we have designed for modelling engines and visualizers (and I guess for pre- and post-processing tools at some point).

From a scientific computing point of view, the version number of an executable (or a library for internal wrappers) is not so essential: more important is to carefully specify the numerical method and all related details. Currently this information is available from the wrapper/modelling engine documentation and the numerical method/modelling engine parameters stored in a CUDS object. Then, mathematically speaking, the data can be reproduced with a totally distinct software (independent implementation).

In particular, I don't think we should require an average user to specify a version number for an executable/library/wrapper: it is too much burden for the user.

I prefer that there is only one version of a wrapper available, and that the wrapper documents which version of an executable/library it relies on. This is also the way it has been done so far (the setup process for wrappers already includes such information).

mehdisadeghi commented 8 years ago

we are not suppose to disallow running arbitrary executables

I totally agree that SimPhoNy is supposed to be extended to cover any executable engine, that is the main idea behind it. However, my point was about existing engines. For example, if a user comes and modifies JYULB engine, I expect the user to fork simphony-jyulb repo, and create a new wrapper for that fine-tuned new JYULB engine, with a new name. If one changes lets say simphony-lammps-md wrapper and executable and still calls it lammps it would be confusing. SimPhoNy should be aware that a CUDS is created with a modified engine.

the numerical method and all related details

Could you please suggest more metadata-keys for covering engine details? Then we add them to the metadata model. So far we have only added models.

I don't think we should require an average user to specify a version number for an executable/library/wrapper

We agree on that. However, I think SimPhoNy should store information in CUDS about the wrappers as part of its Environment.

kemattil commented 8 years ago

I think the user should be responsible for instantiating a wrapper: this (and unique names for the wrappers) should be enough for avoiding (or not creating more) confusions.

I also think we should now proceed with minimal complications so that we will have the new design implemented as soon as possible: I would proceed without extra metadata keys for engine details (as I tried to explain, enough details are already implicitly available).

kitchoi commented 8 years ago

I also think that version may not be a useful attribute to be passed to a python wrapper for a simulation tool.

However there is an immediate concern as for what to do with parameters required in the instantiation of a wrapper. The only instance I know is lammps(use_internal_interface=True). Do we allow this flexibility in passing extra parameters to the initialisation of wrappers? Let's say we do, we can consider each lammps(parameter_a=1, parameter_b=2, parameter_c=3) as its own unique kind of wrapper such that interoperability is still maintained (interoperable among these fine-tuned wrappers). For example:

cuds0 = CUDS()
cuds0["ENGINE"] = ("lammps", {"parameter_a": 1, "parameter_b": 2, "parameter_c": 3})

The above should be contrast with

cuds0["ENGINE"] = "lammps"
Simulation(cuds0, parameters_a=1, parameter_b=2, parameter_c=3)

which breaks interoperability (one can't just pass the cuds around without modifying Simulation).

I don't think we should pollute the metadata schema with these "parameter_a", "parameter_b",... they are likely to be model specific and there could be a billions of them when the number of wrappers grow.

kemattil commented 8 years ago

Is the Simulation class something which has already been decided or is it considered as a possible implementation solution (it has not been discussed in SSB meetings)?

In the M24 meeting we discussed a following solution: first

my_wrapper = Wrapper(my_wparam1, my_wparam2, ...)

and

my_cuds = CUDS(cparam1,cparam2, ...)

Then

my_wrapper.add_cuds(my_cuds)

and finally

my_wrapper.run()

At the end of the run, the wrapper itself can fill in the engine related data before returning, e.g.

my_cuds[CUBA.ENGINE] = {"name": "my_engine_name", "version": 123}

Furthermore, this works also with the Simulation class if the user instantiates the wrappers:

my_simulation = Simulation(my_cuds, my_wrapper)

mehdisadeghi commented 8 years ago

@kitchoi

which breaks interoperability (one can't just pass the cuds around without modifying Simulation).

In order to maintain interoperability, any engine option which is necessary should be part of metadata and stored in CUDS. However, there might be some runtime specific flags which might not affect the final results, but only the performance.

@kemattil

Is the Simulation class something which has already been decided or is it considered as a possible implementation solution

Regardless of the name, the Simulation class is the implementation of a multi-CUDS computational model from M24. It is responsible to run the desired wrapper and store many CUDS in memory or h5cuds.

mehdisadeghi commented 8 years ago

@kemattil

I think the user should be responsible for instantiating a wrapper

If the CUDS is supposed to contain all the information (metadata) about a given computational model, it should contain information about the necessary engines as well. Otherwise, it would be always missing some key information to reproduce the simulation.

Moreover, having wrappers created by SimPhoNy, lets us to make a more dynamic framework. Based on the metadata inside CUDS and metadata provided by each wrapper, SimPhoNy can pick the right engine for the simulation. It can be inquired about available engines, and it would be possible to have a multiple machine setup for SimPhoNy client/server..

kemattil commented 8 years ago

The CUDS only needs to know what engine/wrapper was used to produce the data and with what computational parameters (cuba keys and values). The question is only how and when this information is given. For example,

  1. the user will configure the wrapper with appropriate parameters and then the wrapper can automatically fill these to the cuds object after run (and before returning), or
  2. the user can preset the parameters into my_cuds and during my_wrapper.add_cuds(my_cuds) the wrapper will read the necessary parameters from the cuds object.

SimPhoNy can pick the right engine for the simulation

Only the user knows "the right" engine (in my view the user is the expert), SimPhoNy cannot make such a decision.

kitchoi commented 8 years ago

In order to maintain interoperability, any engine option which is necessary should be part of metadata and stored in CUDS. However, there might be some runtime specific flags which might not affect the final results, but only the performance.

I agree with "being stored in CUDS". But I don't see why it must be part of metadata?

I should have made the first example more clear:

cuds0 = CUDS()
cuds0["ENGINE"] = ("lammps", {"parameter_a": 1, "parameter_b": 2, "parameter_c": 3})
Simulation(cuds0)

which does not break interoperability.

kemattil commented 8 years ago

the Simulation class is the implementation of a multi-CUDS computational model from M24. It is responsible to run the desired wrapper and store many CUDS in memory or h5cuds.

I don't recall such a discussion at M24: perhaps I have simply forgotten this, but definitely no discussions/decisions in SSB.

What is multi-CUDS?

If I remember correctly, for now we have decided to have a single-model CUDS (i.e. only one PE+MR per cuds). In other words, in the first stage multi-model simulations (coupling and linking between the models) would be handled using several distinct cuds objects.

mehdisadeghi commented 8 years ago

@kemattil

the user can preset the parameters into my_cuds and during my_wrapper.add_cuds(my_cuds)

This is almost what have been done here. User adds wrapper parameters along with the wrapper name itself to the metadata and passes it to the Simulation object for further wrapper initialization. This way we can hide some internal parts from users and provide a simpler interface to launch any wrapper known to SimPhoNy.

Only the user knows "the right" engine (in my view the user is the expert), SimPhoNy cannot make such a decision.

Of course! However, SimPhoNy can make that decision based on the wrapper metadata that the user adds to the provided CUDS. Hence the subject of this PR.

What is multi-CUDS?

I spontaneously called it multi-CUDS! In M24, page 32 of my copy, the CUDS and Simulation Wrapper entities are described. As shown on that slide, Simulation Wrapper contains multiple CUDS. In this PR, we have put that inside Simulation. It aces like a Generic wrapper, reads the wrapper metadata from CUDS and SimPhoNy, instantiates the defined wrapper, stores a number of results (e.g. last 10 steps or one CUDS out of every 10 stpes - under development), etc.

Having the Simulation class, users are only responsible to fill CUDS according to the metadata, and run the simulation. It also leaves us the freedom to change the internals of SimPhoNy without affecting users workflows.

mehdisadeghi commented 8 years ago

@kitchoi

why it must be part of metadata?

To be interoperable.

In your example, if we provide result-affecting parameters independant of CUDS, our CUDS object will be missing some result-changing information. Thus, if we store this CUDS and restore it later, we would not be able to repeat the same simulation again.

kitchoi commented 8 years ago

In your example, if we provide result-affecting parameters independant of CUDS, our CUDS object will be missing some result-changing information. Thus, if we store this CUDS and restore it later, we would not be able to repeat the same simulation again.

In my example, the parameters are contained in CUDS (cuds0["ENGINE"] = ("lammps", {"parameter_a": 1, "parameter_b": 2, "parameter_c": 3})), but "parameter_a" (etc) are not defined in the metadata; that follows the assumption that the __init__ of an engine wrapper can accept an arbitrary number of parameters.

kitchoi commented 8 years ago

And I did mean that I think the following is wrong when I said "contrast"

# wrong wrong wrong
cuds0["ENGINE"] = "lammps"
Simulation(cuds0, parameters_a=1, parameter_b=2, parameter_c=3)
kemattil commented 8 years ago

This is almost what have been done here.

Almost, but not exactly - hence the discussion.

I have been trying to propose ways to relieve the user from specifying engine/wrapper related metadata like name (unique string) and version. One way is to make users instantiate the wrappers (then user only needs to know the class name of the appropriate wrapper).

However, if the name strings are easily available for the user, using them is also ok. But we should not bother the users with version numbers (the framework can fill them in).

The multi-CUDS topic is still confusing: @mehdisadeghi are you perhaps mixing multiple state data with multiple CUDS?

The examples by @kitchoi seem ok to me, and could be a way forward.

mehdisadeghi commented 8 years ago

@kemattil

One way is to make users instantiate the wrappers

This is the current status and is enough for scripting. However it leaves out some important points:

  1. The CUDS are incomplete. They miss enough information to reconstruct/continue an earlier Simulation.
  2. It is unclear how the information inside these CUDS are obtained. In a single user's case, where user can remember parameters, this sounds OK. It does not scale to larger use-cases, however.
  3. A SimPhoNy server platform must be able to run Simulation without user intervention. The current state prevents this (no way to represent engines).

are you perhaps mixing multiple state data with multiple CUDS

Having multiple CUDS in the Simulation (the generic wrapper), is the implementation of having multiple state data. One CUDS per simulation step. The Simulation object will keep track of these CUDS, based on the wrapper. There is also a related issue here.

@kitchoi Oh, you're right! I had misread your comment. Even though adding parameters to the ENGINE, the way you described, is possible, it needs some clarification. What are those parameters? Having the parameters without metadata information is inadequate. SimPhoNy needs to understand and interpret them. CUDS is all about metadata after all.

kemattil commented 8 years ago
  1. What information is missing? Do not follow ...
  2. Do not understand ...
  3. Server platform is a valid point: therefore the Simulation object (and the instantiation of wrappers based on names) makes sense.

One CUDS per one state data (which can include multiple datasets, e.g. Lattice and Mesh datasets) is ok. And a Simulation object managing them is a good idea.

However, running pre- and post-processing wrappers as well as visualization wrappers via Simulation object sounds somewhat peculiar ...

The remaining issue is the initialization of wrappers: @kitchoi proposes to use dictionaries for passing the init.parameters.

If the keys are, e.g., strings, they already provide a kind of documentation/metadata.

If we do not have proper use cases with relevant init.parameters, I think the dictionary is fine at this point (I consider file-io and internal wrappers for a common engine as distinct wrappers; this choice does not need to be provided at initialization). To me it resembles, e.g., the C++ approach for the main function (argc + argv).

Metadata schema describing general init.parameters seems too much (especially since we do not appear to know what should be described) ...

kitchoi commented 8 years ago

What are those parameters?

These parameters are arguments to the __init__ of an engine wrapper (assuming that we allow this). So the questions are, (1) what should go to the __init__ of an engine wrapper? (2) should these configuration parameters for initialising engine wrappers belong to metadata or not? (I think we have all agreed that the configuration should be in the CUDS regardless of whether the parameter is in the metadata)

IMHO, my answer to the first question: these should be parameters that do not hurt interoperability. For example, one should be able to consider Lammps(parameter_a=1, parameter_b=2) and Lammps(parameter_a=2, parameter_b=10) are simply two different engine wrappers, much like comparing “Kratos_CFD” and “Openfoam”. Examples of such parameters are use_internal_interface and verbose (I just imagined it). These are also parameters that are too trivial or too varied to create a separate class for.

For the second question, my inclination is not to put them in the metadata. Imagine the following implementation:

Say we are to put all engine configuration in the metadata

cuds0[CUBA.ENGINE] = "lammps"   # or "kratos"
cuds0[CUBA.USE_INTERNAL_INTERFACE] = True   # say only Lammps uses it
cuds0[CUBA.VERBOSE] = True           # say only Kratos uses it
Simulation(cuds0)

And this is the context as far as I understand (I can't think without code 😄 )

class Lammps(ABCModelingEngine):
      def __init__(self, use_internal_interface):

class Kratos(ABCModelingEngine):
      def __init__(self, verbose):

class Simulation(object):
     def __init__(self, cuds):
         # you have to query the metadata to know what keys are required
         parameters = Lammps.supported_parameters()   # (CUBA.USE_INTERNAL_INTERFACE,)

        # probably have to convert all CUBA key name to lower case in order
        # to construct the keyword arguments
        # for lammps, it is going to be {"use_internal_interface": cuds[CUBA.USE_INTERNAL_INTERFACE]}
         kwargs = ...  
         self.wrapper = Lammps(**kwargs)

A user, a beginner, being given a sample run script would need to look up what CUBA.USER_INTERNAL_INTERFACE and CUBA.VERBOSE are for. And probably have to look up the metadata to realise changing CUBA.VERBOSE would have no effect on a Lammps engine wrapper. But that’s okay. My main concern is about polluting the metadata and cuba. Not only that one has to define the parameters in the metadata, the parameter needs to be defined in the cuba.yml as well (e.g. VERSION in this PR). You end up with a lot of parameters only used by one or two tools. Also, these parameters maybe generic, versatile and could be used in multiple places. Say a user wants to switch verbose=True for Kratos (engine) but verbose=False for aviz (post-processing), but there is only one CUBA.VERBOSE key?

I tried to understand and imagine the usage here. Please correct me if I have misunderstood anything.

ahashibon commented 8 years ago

However, if the name strings are easily available for the user, using them is also ok. But we should not bother the users with version numbers (the framework can fill them in).

@kemattil I Agree. it should be enough to specify the wrapper, whether it is using File-IO or Internal, and SimPhoNy should take care of storing the version string to the CUDS. In this way, we both get traceability and user friendliness.

Whether the wrapper is initialised as a class that takes a CUDS or we have a generic wrapper/simulation that takes a CUDS that has a wrapper keyword inside seems, largely, a matter of taste and style?

myCUDS.add (CUBA.WRAPPER) = CUBA.LAMMPS_MD mysim = simulation(cuds=myCUDS)

vs.

mysim = LAMMPS_MD_WRAPPER (myCUDS)

what are the pros and cons of each?

ahashibon commented 8 years ago

These parameters are arguments to the init of an engine wrapper (assuming that we allow this). So the questions are, ..

@kitchoi lets keep the design simple, predictable, and the interface universal, which will lead to more intop.

There should be the same number of parameters that could be passed to the wrapper by the user, and these should be restricted to name of the wrapper (either by a class call or a CUDS keyword), and whether its File-IO or internal (or whether we use a network mode, etc). I do not see an advantage for having specific parameters passed, these should be part of the CUDS (e.g., as solver parameter, or physics equations, etc)

mehdisadeghi commented 8 years ago

@kemattil

One CUDS per one state data (which can include multiple datasets, e.g. Lattice and Mesh datasets)

We agree on that and this is exactly the plan.

running pre- and post-processing wrappers as well as visualization wrappers via Simulation object sounds somewhat peculiar

We also agree on this! The chosen visualizer is not part of CUDS data. I'm not sure about other pre/post processing tools, however. Comments are welcome. (I'll probably remove other keys from metadata for now).

I consider file-io and internal wrappers for a common engine as distinct wrappers

If they are two different wrappers, then they should have two different names. If we store this name inside CUDS, we restrict that CUDS to fileio or internal. However, for a CUDS obtained using fileio, it should be possible to run it using internal and vice versa. Therefore, I see fileio/internal as an option to the wrapper. I think that it is not necessary to store this option inside CUDS.

Metadata schema describing general init.parameters seems too much

I agree. I think any option which changes the output should be part of metadata, otherwise not.

mehdisadeghi commented 8 years ago

@kitchoi

I think we have all agreed that the configuration should be in the CUDS regardless of whether the parameter is in the metadata

See you my previous reply. IMO, configuration keys that have no effect on the result of a Simulation are not part of CUDS.

one should be able to consider Lammps(parameter_a=1, parameter_b=2) and Lammps(parameter_a=2, parameter_b=10) are simply two different engine wrappers

Then they should have two different names. However, if both are lammps wrappers, and use lammps engine, chances are that they should be merged into one.

cuds0[CUBA.ENGINE] = "lammps" # or "kratos" cuds0[CUBA.USE_INTERNAL_INTERFACE] = True # say only Lammps uses it cuds0[CUBA.VERBOSE] = True # say only Kratos uses it Simulation(cuds0)

Engine is a metadata structure, thus:

engine = Engine(name="lammps", interface=CUBA.INTERNAL, CUBA.VERBOSE)
c = CUDS()
c.add(engine)
sim = Simulation(c)
sim.run()

or

engine = Engine(name="lammps")
c = CUDS()
c.add(engine)
sim = Simulation(c, interface=CUBA.INTERNAL, verbose=True)
sim.run()    

or

engine = Engine(name="lammps")
c = CUDS()
c.add(engine)
sim = Simulation(c, interface=CUBA.INTERNAL)
sim.run(verbose=True)

Engine as a CUDS component has a data property which can be used to put extra CUBA values inside. It will not accept non-CUBA keys, however.

Since the Simulation class will behave like a generic wrapper, therefore it is necessary to use keyword arguments in init methods. This way, all the wrapper will play nicely with arguments passed by the Simulation class. Relying on the order is fragile in this case. Thus:

class Lammps(ABCModelingEngine):
       def __init__(self, *args, **kwargs):
           if kwargs.get('use_internal_interface):
               # use it

class Kratos(ABCModelingEngine):
       def __init__(self, *args, **kwargs):
           if kwargs.get('verbose'):
               # be verbose!

This is a very common practice for writing wrappers in python.

You end up with a lot of parameters only used by one or two tools

I agree with that. I am looking forward to finding a way to improve this.

kitchoi commented 8 years ago

See you my previous reply. IMO, configuration keys that have no effect on the result of a Simulation are not part of CUDS.

Yep, that makes sense.

Then they should have two different names. However, if both are lammps wrappers, and use lammps engine, chances are that they should be merged into one.

That's fine too.

Engine as a CUDS component has a data property which can be used to put extra CUBA values inside.

Thanks for the sample codes. I did not think of the Engine as a class that does anything than being subclassed. Having the Engine as meta wrapper looks good to me.

It will not accept non-CUBA keys, however.

I don't see why not, but I also don't have a strong opinion for the opposite either.

Minor details: (1) Why should the engine be added to CUDS? I thought CUDS and engine are separate and are handled by Simulation. i.e. I am more thinking of

engine = Engine(name="lammps", interface=...)
c = CUDS()
sim = Simulation(c, engine)  # <--- here
sim.run(verbose=True)   # if verbose is accepted by the engine.run

(2) Looks like Engine should not be generated (since it is in simphony_metadata.yml) but should be written separately. Does this make sense?

kemattil commented 8 years ago

@ahashibon

Whether the wrapper is initialised as a class that takes a CUDS or we have a generic wrapper/simulation that takes a CUDS that has a wrapper keyword inside seems, largely, a matter of taste and style?

I agree. But as I already expressed, I do not like introducing keywords like CUBA.LAMMPS_MD. I prefer, e.g., myCUDS.data[CUBA.ENGINE_NAME] = "lammps_md".

@mehdisadeghi

I agree. I think any option which changes the output should be part of metadata, otherwise not.

This seems like a good policy. And I can live with the file-io/internal as an argument/option.

@kitchoi

(1) Why should the engine be added to CUDS? I thought CUDS and engine are separate and are handled by Simulation. i.e. I am more thinking of

I'm thinking in the same way. It seems to imply that CUDS is storing engine(s) too (not only the simulation data and parameters) ...

On the other hand, at some point somebody must supply CUDS with the information about the wrapper/engine used to produce the data stored in the CUDS (the reproducible data issue/problem). Perhaps the Simulation object can do it?

Or maybe something only slightly different (just a cosmetic change):

my_cuds.data[CUBA.ENGINE] = my_engine

mehdisadeghi commented 8 years ago

I think we have reached to a consensus on the discussed issues and we are good to go. Of course this is not perfect and could be improved. As a quick recap:

User provides the engine name to the Simulation, SimPhoNy will add further information to the CUDS.

c = CUDS()
sim = Simulation(c, engine_name="LAMMPS", interface="internal/external")  # <--- here
sim.run()

The only unclear point at the moment is the interface keywarod arguement. However I think we all agree that we can improve this at a later time, by adding it to the CUBA/CUDS or simply leaving it as is.

If no further comments appear, I will merge this PR soon.

ahashibon commented 8 years ago

you have a green light from my end!

kitchoi commented 8 years ago

Thanks @mehdisadeghi for your patience!

mehdisadeghi commented 8 years ago

@kitchoi You're welcome!

mehdisadeghi commented 8 years ago

Since no other comments show up I merge this PR.

Please feel free to open a new issue to address any improvements.