shogun-toolbox / shogun

Shōgun
http://shogun-toolbox.org
BSD 3-Clause "New" or "Revised" License
3.03k stars 1.04k forks source link

Shogun base classes #2880

Open karlnapf opened 9 years ago

karlnapf commented 9 years ago

Shall we add another one in the layer

  1. A minimal base class CSGObject that just allows for reference counting and all the basic Shogun stuff.
  2. A second one CSerializableObject that adds all the parameter stuff and inherits from the first?

I guess this could minimize Shoguns overhead quite a bit (many classes dont use parameters)

@yorkerlin @lisitsyn @sonney2k @vigsterkr @lambday @iglesias @tklein23 @besser82 what are your thoughts?

karlnapf commented 9 years ago

Could also add another layer that provides all the sg_* fields

lisitsyn commented 9 years ago

We would have to use multiple inheritance, it could be PITA you know :)

karlnapf commented 9 years ago

Why multiple inheritance? That's not my idea.

If the base classes are a hierarchy, one could just inherit from a certain stage in there?

sonney2k commented 9 years ago

I did add SGReferencedData (or so) at some point that did exactly that. I guess it did get removed for some reason by someone who knows why that was maybe not a good idea?

On Wed, 2015-08-05 at 01:43 -0700, Heiko Strathmann wrote:

Why multiple inheritance? That's not my idea.

If the base classes are a hierarchy, one could just inherit from a certain stage in there?

— Reply to this email directly or view it on GitHub.

karlnapf commented 9 years ago

Thats very similar, but for the SGMatrix, SGVector classes. Exactly that type of thing is what I want to do for the base class CSGObject.

karlnapf commented 9 years ago

What about this:

CReferencedObject -> CSerialisableObject -> CSGObject

Where the first one contains only the reference counting, the second adds the parameters/serialisation, and the third adds all the Shogun stuff

sonney2k commented 9 years ago

checkout

9305aa9d1bab28d20979033b0fd7c8e2d545b85e

and look at src/shogun/base/SGRefObject.h

On Wed, 2015-08-05 at 04:12 -0700, Heiko Strathmann wrote:

What about this:

CReferencedObject -> CSerialisableObject -> CSGObject

Where the first one contains only the reference counting, the second adds the parameters/serialisation, and the third adds all the Shogun stuff


Reply to this email directly or view it on GitHub: https://github.com/shogun-toolbox/shogun/issues/2880#issuecomment-127957021

lisitsyn commented 9 years ago

@karlnapf okay this works. Though we don't really know whether referenced is always serializable or serializable is always referenced :)

karlnapf commented 9 years ago

I see now. This is good enough for @yorkerlin s patch. Thanks I did not know

iglesias commented 9 years ago

The idea makes sense to me, and I like it because it would separate in each of the classes Heiko has mentioned above reference counting, serialisation, and rest of Shogun stuff (although I am not sure what this rest would be :-)

I am not entirely sure however how much overhead it would reduce. What overhead are we talking about? SWIG or memory footprint per SGObject?

Another (very) wild idea somewhat related would be to use smart pointers as they are now part of the default C++ compiler. Then we could get rid of our own reference counting.

karlnapf commented 9 years ago

yeah smart pointers would be best.

@lisitsyn what s your take on all this?

karlnapf commented 9 years ago

@sonney2k why was the SGRefObject removed?

sonney2k commented 9 years ago

Do not remember. Git logs say that Thorsten removed it. "Obsolete ".

On August 10, 2015 11:30:21 PM GMT+02:00, Heiko Strathmann notifications@github.com wrote:

@sonney2k why was the SGRefObject removed?


Reply to this email directly or view it on GitHub: https://github.com/shogun-toolbox/shogun/issues/2880#issuecomment-129613832

Sent from Kaiten Mail. Please excuse my brevity.

iglesias commented 9 years ago

The class was removed during the last summit when we were reducing SWIG overhead, in terms of memory used to compile or, equivalently, the size of the (huge) file SWIG creates. Before it was removed, it looked like this SGRefObject -> CSGObject, and all the classes were inheriting from CSGObject, not from SGRefObject.

Although it should be possible to use SGRefOject for what you have mentioned above, it was not used like that. It was just pulling out the reference counting logic from CSGObject. Thoralf removed it to reduce the number of classes (by one, in this case) that are exposed via SWIG.

iglesias commented 9 years ago

2581

lisitsyn commented 9 years ago

@karlnapf

making the base class overhead much smaller

I like the principle, although it is not really clear for me what to remove.

serialisation

I think it should stay as long as we are rare ML library with this feature.

reference counting?

I believe it should be done via shared pointers.

karlnapf commented 9 years ago

I dont get the point of removing the class. We could just hide it from SWIG? If one class contains what two classes used to contain, nothing is really gained. We could revert that?

@lisitsyn

iglesias commented 9 years ago

There is more dicussion about it here #1853 #1764.

Reverting is easy peasy with git revert #2883. But I suggest not to merge this until we have checked the above pull requests.

iglesias commented 9 years ago

Hmm, it seems that SGRefObject was not introduced due to what @sonney2k pointed out above: #1770. Still, I believe it should be possible to use it for what we were talking about.

iglesias commented 9 years ago

This is the pr where it was merged #1771.

yorkerlin commented 9 years ago

@lisitsyn @lambday Do you guys plan to rewrite SGObject? Do we use the existing parameter selection (model selection) framework? The existing model selection framework is not flexible. Currently, the existing framework searches for all model parameters since class Parameter only supports add but does not support remove

The existing framework only supports:

I want the framework supports variational parameters (need to be optimized but are not model parameters)

In future, I may have to extend the framework in the following ways:

lambday commented 9 years ago

Hi @yorkerlin . If we in fact happen to rewrite SGObject, probably quite a few things would change. @lisitsyn proposed having a class Property for each parameter which fits nicely and intuitively for model selection. In this scenario, I think we'd be able to implement the enable/disable feature you're suggesting quite easily.

I am unaware of variational parameters ATM. If they are expected to be optimized differently, maybe we can have a separate class for them and provide its functionality accordingly.

yorkerlin commented 9 years ago

@lambday Cool! variational parameters are auxiliary variables.

For batch inference,

For stochastic inference,We can update model parameters and variational parameters at the same time.

For stochastic inference, variaitonal parameters can be viewed as model parameters. Enable/Disable some parameters will be the key!

yorkerlin commented 9 years ago

CMap does not support serialization. However, CMap is a subclass of SGObject. At least I want CStringMap (that is CMap<std::string, SGVector<float64_t> >) is serialable.

ref https://github.com/shogun-toolbox/shogun/issues/2903

lambday commented 9 years ago

@yorkerlin from the requirements, going by policy based design for the parameters sound reasonable to me, where the policies are different optimization policies. So laying down the requirements

[1] http://www.gotw.ca/publications/mill04.htm [2] http://www.gotw.ca/publications/mill05.htm [3] http://stackoverflow.com/questions/8447972/how-to-combine-auto-gettersetter-with-pimpl-design-pattern-in-a-public-api-inte# [4] http://www.idinews.com/quasiClass.pdf [5] http://www.javaworld.com/article/2073723/core-java/why-getter-and-setter-methods-are-evil.html

yorkerlin commented 9 years ago

@lambday @lisitsyn some old issues related to this https://github.com/shogun-toolbox/shogun/issues/1251 https://github.com/shogun-toolbox/shogun/issues/779

lambday commented 9 years ago

Thanks @yorkerlin for tagging the related issues.