simphony / simphony-common

The native implementation of the Simphony cuds objects
BSD 2-Clause "Simplified" License
7 stars 5 forks source link

Suggestion: Make hdf5 package under io #91

Closed mehdisadeghi closed 8 years ago

mehdisadeghi commented 9 years ago

I suggest to make a new package called h5 or pytables package under simphony.io and then move all pytables related files from io to the new package. I also suggest to change module names inside that folder as the following:

data_container_description
data_container_table
indexed_data_container_table

All to be moved to a containers or similarly named module (one module) or a package.

file_mesh -> h5_mesh

Everything could be h5 prefixed.

Moreover we could have a convention in favor of shorter names and less modules. And last but not least I could think of an io layer that give us flexibility to change persistence implementation with no code change.

Please let me know your ideas.

itziakos commented 9 years ago

Everything could be h5 prefixed.

Yes we are currenlty moving towards this naming

itziakos commented 9 years ago

And last but not least I could think of an io layer that give us flexibility to change persistence implementation with no code change.

I am not against a common layer, but I would propose do to spend time on such a task without having more than one examples of the serialization backends.

However, there is already some code duplication in the io subpackage which can lead to some unification.

To summarize that I suggest that we wait a little to allow for the api to settle down a little. There are a number of changes in the pipeline that will effect the api in ways where an aggressive clean up might be rendered unnecessary.

itziakos commented 9 years ago

I suggest to make a new package called h5 or pytables package under simphony.io and then move all pytables related files from io to the new package.

I am not sure it is necessary at this moment, but I can see the need for it when alternative backends have been implemented for serialization.

I would suggest to have an api.py file under the io subpackage where the publicly available classes and functions are collected and made available. This way importing the main classes will become independent of the sub-package structure.

from simphony.io.api import H5Particles 
itziakos commented 9 years ago

I also suggest to change module names inside that folder as the following:

data_container_description
data_container_table
indexed_data_container_table

I personally would like to revisit the names of these classes but I do not think that there is a need to add them into a containers moduel or subpackage.

e.g:

   data_container_description  -> h5_records
   data_container_table -> h5_uid_data
   data_container_table -> h5_data
mehdisadeghi commented 9 years ago

from simphony.io.api import H5Particles

I prefer to hide the underlying persistence technology from other packages all together. Using term api here could be confusing as well because that is not supposed to be exposed as application's interface.

Instead, I could think of an abstract store such as ABCStore which would have non-technology related API:

class ABCStore(metaclass=ABCMeta):
    def add_particle_container(): pass
    def add_mesh():pass
    def add_lattice():pass
    and etc.

the package structure could be then like this:

simphony/
    io/
        __init__.py
        ABCStore
        h5/
            __init__.py
            h5_store.py
            pytables_engine.py
            h5py_engine.py

I could think of a dynamic initialization on application startup such as:

#simphony.io.__init__.py
from .h5.h5_store import H5Store
store = H5Store(engine='pytables') # we could apply configuration here

Then one would import the store and use it to work with particle containers.

These are some rough ideas which may not fit to the current state of the application, but later (when I finish my thesis ;-) ) we could discuss them and find some interesting ideas :)

tuopuu commented 9 years ago

I think the proposed change is a good idea and should be implemented in future, but it is not relevant at this point. Hardcoded backend based on Pytables/hdf5 is a good compromise right now as we need to focus on other things stated on the DOW.

Personally, I could see this proposition become topical again when we start benchmarking with real simulations (e.g. with HPC lattice boltzmann simulations), and could demonstrate that Pytables is not fast/efficient enough for very large files.

itziakos commented 9 years ago

I do not think that having dynamically selected back-ends is not useful. Since it does not provide any real advantage. One still has to go into the code and change the back-end used and the the code needs to be aware of the available back-ends. It does not really work when the back-ends require special initialization (e.g. a web service store). And mainly makes the implementation more complicated to extent.

However in python it is common to use a module based setup on can with equal ease change back-ends and use new implementations that have been later implemented.

Example:

from simphony.io import H5Store, SQLStore
from my_library import MyStore

storage = H5Store(<filename>)

storage = SQLStore()

storage = MyStore(<url>)
nathanfranklin commented 9 years ago

I think we should close this issue for the time being as we only are using one backend (hdf5) at the moment. Once we want to add another serialization backed, we can reopen this issue.