sofa-framework / sofa

Real-time multi-physics simulation with an emphasis on medical simulation.
https://www.sofa-framework.org
GNU Lesser General Public License v2.1
871 stars 297 forks source link

[SofaPython] ADD Improve data field in SofaPython #286

Closed VannesteFelix closed 6 years ago

VannesteFelix commented 6 years ago

In createObject...

When you given an object as argument instead of a string the binding convert it (with str). This is problematic in many cases eg. with a list it is converted with the bracket (ie: '[' ) and comma as separator.

To resolve that I add a method that, if a list is passed as an argument, convert it as a string following the sofa style (without comma and bracket).

CHANGELOG for @hugtalbot & @guparan : [SofaPython]: change the way createObject() handle its arguments to simplify scene writing + batch of tests.


This PR:

Reviewers will merge only if all these checks are true.

sofabot commented 6 years ago

Thank you for your pull request! Someone will soon check it and start the builds.

damienmarchal commented 6 years ago

Hi Felix,

Thank you for your PR. Let's see if it builds and if people have feedback to give you.

[ci-build]

maxime-tournier commented 6 years ago

The PR looks good, though I wonder whether we could use the abstract sequence protocol instead of the concrete list type. This way we could use numpy vectors directly for instance.

The warning message will quickly become annoying, so maybe put it behind proper #ifdefs.

@matthieu-nesme any thought?

damienmarchal commented 6 years ago

The warning will be annoying but changing the behavior of a core function without warning users is also very problematic.

I see several option to control this message:

Other approaches ?

matthieu-nesme commented 6 years ago

I already worked on binding Data vectors on this branch: sofapython_containers. It is not finished yet, but any help is welcome.

Mainly making proper binding for vector of vectors (of vectors...) with different sizes (and SVector). I had to enhance DataTypeInfo for that, it implies quite deep modifications.

The SofaPython/examples/SofaNumpy.py example shows what is doable (even non-numpy related stuff).

But it is a pain in the ass to fix codes that were beneficing from those undefined/incorrect behaviors. It is why it is not yet PR, but maybe it could become a wip PR?

maxime-tournier commented 6 years ago

@damienmarchal Agreed, except it seems the PR does not affect any working behavior, does it?

matthieu-nesme commented 6 years ago

Sorry, I checked the PR too quickly.

You are right, the factory is not made to give anything else than strings as arguments.

A workaround, when possible, is to pass these Data after the component is created.

mycomponent = node.createObject('Component',data1="foo") mycomponent.vector = [1,2,3]

But vector binding have to be cleaned too.

damienmarchal commented 6 years ago

@maxime-tournier I think that any code that pass a sequence to createObject will work differently. Eg: 1) passing a python sequence to a Data was resulting in [1,2,3], now it will be 1 2 3. If your component count on the [1,2,3] structure to do something it will not work anymore.
2) passing a python sequence to a Data was resulting in [1, 2 3] but because the parser was not reporting problems this was resulting in something different (0 0 0 ?) ... if the scene was, 'by surprise', working...it will not work anymore.

The case (1) is probably rare but not fundamentally bad. The case (2) is clearly bad but exists.

1 & 2 are rare cases and I'm convinced we shouldn't annoy anyone using createObject with warning messages this is why I personally think that having a global switch (not an #ifdef) to print these kind of messages would be great to tools to help users to debug their scenes through the API changes. (maybe it is overkill :))

About having the warning in the 'else' case. It may be an option but we still want to support simple conversion from int/string/float values without a warning right ?

matthieu-nesme commented 6 years ago

I vote for no warning message, and no global switch! Keep it light and simple. The only case where you really expect to have the string "[1,2,3]" is for Data that would report a problem for "1,2,3", and the fix is simple (simply add some ""). But frankly who was benefiting from that?!

Indeed we do not want warning messages when converting scalars, but we can easily add a test for specific types (float, bool, int...). At least everything would be clearly exposed in the code.

Also, I propose that if you pass a Node, a BaseObject or a Data, it uses their getLinkPath (defined in the bindings, i.e. "@/path/from/root/node", "@/path/from/root/node/myobject", "@/path/from/root/node/myobject.mydata" ). It would be so useful when creating Links that are required during creation (like the mapping's inputs).

maxime-tournier commented 6 years ago

@damienmarchal alright you convinced me :-)

1 [...] If your component count on the [1,2,3] structure to do something it will not work anymore.

This is what I was questioning: are there any such component?

2 [...] if the scene was, 'by surprise', working...it will not work anymore.

This looks like a feature to me!

@matthieu-nesme I would love to have BaseObject instances converted to strings automatically through getLinkPath(). Combined with sequences, we'd have input = [dofs1, dofs2] to declare multi-mappings, which is great.

damienmarchal commented 6 years ago

I'm not sure we have an easy way to detect that a PyObject is really a BaseObject, the getLinkPath trick should go for a future PR.

I'm making the requested change to felix PR.

matthieu-nesme commented 6 years ago

Must be doable with PyObject_IsInstance. In fact getLinkPath should be defined in Base as it also makes sense for Node.

damienmarchal commented 6 years ago

I think that PyObject_IsInstance will work if there is a class or type associated with BaseObject...I'm not sure this is the case but I will have a look at that.

maxime-tournier commented 6 years ago

@damienmarchal There is one, see Binding_BaseObject.cpp

maxime-tournier commented 6 years ago

In fact, we have to go down to the iterator protocol as some strings are initialized with proxies (e.g. ' '.join(...)) and the Sequence protocol fails on getting the length/size.

The following conversion code passes all python tests in Compliant:

static std::string convert_string(PyObject* object) {
    if( PyString_Check(object) ) {
        // strings will be strings
        return PyString_AsString(object);
    }

    if( PyIter_Check(object) ) {
        // TODO we should throw and abort createObject in case of error
        std::stringstream ss;
        bool first = true;

        PyObject *iterator = PyObject_GetIter(object);
        if(!iterator) {
            msg_error("createObject") << "error while creating iterator";
        }

        while(PyObject* next = PyIter_Next(iterator)) {
            if(first) first = false;
            else ss << ' ';

            ss << convert_string(next);
            Py_DECREF(next);
        }

        Py_DECREF(iterator);

        if (PyErr_Occurred()) {
            msg_error("createObject") << "error while iterating";
        }

        return ss.str();
    }

    // link path conversion for baseobjects
    if( PyObject_IsInstance(object, (PyObject*) &SP_SOFAPYTYPEOBJECT(BaseObject)) ) {
        const std::string path_name =  (((PySPtr<Base>*) object)->object->toBaseObject()->getPathName());
        return '@' + path_name;
    }

    // fallback to python string conversion
    return PyString_AsString(PyObject_Str(object));
}
maxime-tournier commented 6 years ago

Actually I am a bit concerned about the following case:

node.createObject('SomeMultiMapping', 
                  input = ' '.join([dofs1.getLinkPath(), dofs2.getLinkPath()]))

The value for input is not a string, it's a sequence that evaluates to a string and we'll probably end up iterating over each character individually with my code snippet.

There are probably other cases like this, so we'd better stick to checking for actual lists.

damienmarchal commented 6 years ago

@maxime-tournier
I currently have tests with this pattern: {"'aString'.join('[ ]')", "[aString aString]"}
which succeed.

I also have added this one: {"' '.join(['AA', 'BB', 'CC'])", "AA BB CC"}, Which also succeed.

I didn't managed to build something that fails...if you find some..please report so we can improve & update the tests.

EDIT: I'm more concerned about the automatic conversion to getPath/getLinkPath for which I'm not sure we can find a clean semantic. I suggest we let that for a different PR...because this out of the scope of the PR initially submitted by Felix.

maxime-tournier commented 6 years ago

After some tries, I could not come up with something that fails either. This is good news, right? :-)

As to the semantics, maybe we should simply add an __str__ method to BaseObject that returns getLinkPath().

This way we don't have extra special cases in createObject and the semantics become clear enough to me: arguments are converted to strings unless they are sequences, in which case they are converted to the expected format.

damienmarchal commented 6 years ago

@maxime-tournier implementing that in the BaseObject sounds much cleaner to me either I like it.

matthieu-nesme commented 6 years ago

Please add getLinkPath to Base and then implement __str__ in Base's binding (rather than BaseObject). So it will work with Nodes too.

matthieu-nesme commented 6 years ago

BTW great job guys ;)

maxime-tournier commented 6 years ago

Ok the following works:

static std::string convert_string(PyObject* object) {
    if( PyString_Check(object) ) {
        // strings will be strings
        return PyString_AsString(object);
    }

    if( PySequence_Check(object) ) {
        std::stringstream ss;

        for(Py_ssize_t i = 0, n = PySequence_Length(object); i < n; ++i) {
            if(i > 0) ss << ' ';
            ss << convert_string(PySequence_GetItem(object, i));
        }

        return ss.str();
    }

    // fallback to python string conversion
    PyObject* str = PyObject_Str(object);
    std::string res = PyString_AsString(str);
    Py_DECREF(str);

    return res;
}

Binding_BaseObject.cpp:

SP_CLASS_METHODS_END;

static struct patch {
    patch() {
        // set __str__ slot for BaseObject
        SP_SOFAPYTYPEOBJECT(BaseObject).tp_str = [](PyObject* self) {
            return BaseObject_getLinkPath(self, nullptr);
        };
    }
} patcher;

SP_CLASS_TYPE_SPTR(BaseObject, BaseObject, Base)

It does breaks some python tests in Compliant, but this is due to errors there. I can push code to another PR that does the whole thing if needed (I don't think I have push access to this one).

maxime-tournier commented 6 years ago

@matthieu-nesme are there examples of links to nodes ? I was going to put it in Base, but I thought maybe keeping errors for Node might be desirable.

matthieu-nesme commented 6 years ago

@maxime-tournier you can give node path to mapping's input (rather than the mstate path).

Warning with the fallback string conversion with a Data. Data's str already gives the stringvalue, so everything will work with a copy, maybe, we'd like to rather create a Link? (and so use Data.getLinkPath)

I would prefer to explicit cases in this function.

if string: elif sequence: elif scalar or bool: elif Base: elif Data: else: fallback string conversion + WARNING

damienmarchal commented 6 years ago

Thanks for your feedback,

We need examples of use of Data & Base conversion to add in the tests.

sofabot commented 6 years ago

Thank you for your pull request! Someone will soon check it and start the builds.

maxime-tournier commented 6 years ago

@matthieu-nesme

I moved everything to Base in my branch, works fine so far. You get the following semantics with Data:

    data = d2.findData('position')

    # copy
    d3 = n3.createObject('MechanicalObject', size = 1, position = data)

    # alias
    d4 = n4.createObject('MechanicalObject', size = 1, position = data.getLinkPath())

@damienmarchal should I open another PR ?

damienmarchal commented 6 years ago

@maxime-tournier Not to sofa-framework. I think the right process is to make a pull request from your branch to SofaDefrost:fix_pylistarg

maxime-tournier commented 6 years ago

Ok thanks. I have no time to do this now unfortunately.

damienmarchal commented 6 years ago

The current implementation is now converting things like: Sequence (non recursively), string, scalar, and boolean without a warning.

If the object has a method named "getSofaPath()". This one is used in the createObject to do the conversion. My target was to avoid collision with the existing str() function that may have a different semantic and I wanted to preserve it. This function could be implemented in Base and Data but in virtually any object we want to be used as a parameter in createObject.

It sounds nice to me because it make things "explicit" and convenient as it simplify user's scripts. I'm not a big fan of 'getAsSofaPath()' and though about getAsSofaValue or getAsACreateObjectParameter(). What do you think ? Do you have alternative names ?

PS: @matthieu-nesme you said: ... "Data's str already gives the stringvalue",

    first = rootNode.createObject( 'ExternalComponent', name='theFirst')
    print('VERSION 1:'+ str( first.findData('name') + " type: " + str(type(first.name)) )
    print('VERSION 2:'+ str( first.name ) + " type: " + str(type(first.name))  )

Actually prints:

   VERSION 1: <Sofa.Data object at 0x7ff43b77f6f0> -> <type 'Sofa.Data'>
   VERSION 2: theFirst -> <type 'str'>

So str() function is not implemented as you said.

matthieu-nesme commented 6 years ago

@damienmarchal My bad, Data str binding is only implemented in our fork... Let's consider it returns the stringvalue, would you like?

Is what you call "getSofaPath()" correspond to the already existing "getLinkPath()"?

myobject->getName() => "myobjectname" myobject->getPathName() => "/path/to/my/object/myobjectname" myobject->getLinkPath() => "@/path/to/my/object/myobjectname"

maxime-tournier commented 6 years ago

@matthieu-nesme what's wrong with using str ? The semantics are clear and concise and you don't have to mentally infer the various checks that will take place when converting. The explicit case checking looks much worse to me. (sorry for bikeshedding)

@damienmarchal why not recursive? This prevents setting MechanicalObject positions when size > 1

maxime-tournier commented 6 years ago

My target was to avoid collision with the existing str() function that may have a different semantic and I wanted to preserve it

There is no existing str function for python component wrappers (grep tp_str), the default python implementation is used.

In python the semantic of str is to provide a concise representation of an object as a string, which is exactly what BaseObject.getLinkPath does. So really, using str seems the way to go.

matthieu-nesme commented 6 years ago

@maxime-tournier ok I am convinced, let's use str for everyone (except list/iterable) to remain on a simple implementation. Please could you propose us an implementation?

maxime-tournier commented 6 years ago

I have a branch in 9e33983a895950e46b8df54996f6a4d67c0e4c2b ready to rebase on this one, but I need #283 merged first as they touch the same files.

damienmarchal commented 6 years ago

@matthieu-nesme I think we should implement str() as you have in your fork ;)

First of all thanks Matthieu and Maxime for the interest and time you put into this PR because it is getting very long.

Let's me summary the different approach that have been tried so far.

1) The first version of the code was doing conversion by hard-coding the object type detection and calling getLinkPath(). The problem with this version is that it cannot be extend without adding more hard-coded test and recompiling the SofaPython plugin. Everyone agree this version is bad.

2) The second version that was suggested by maxime is much more extensible as it was using the str() operator. The problem I see with this approach is that the str function have been implemented and is used in a lot of place (including Anatoscope fork) and so it may not be a god idea to now force the str() function to return the path just to make the createObject function happy. In this version prevent us to warn user there is a conversion done and that if they change the str() because, well its python they can do that, it may break something. This version is imposing that people writing a 'str' function have to keep in mind that (even if they don't know nothing about that) it must work if their object is called in createObject. (see later I try to explain better the problem). So to me this version will generate a lot of subtile problems and backward compatibility issues.

3) The third version is close to use the str() in term of design but it makes it explicit that if an object want conversion to be used in the parameters of the createObject function it have to follow a specific protocol...this is done by implementing a dedicated function (eg: getAsACreateObjectParameter()). The implementation of this function for BaseObject and Sofa.Data could simply link to getLinkPath(). But if you have your own 'complex' object and want it to be used as a parameter in createObject you can just implement your own conversion schema. So fundamentally it is like 'str' but with a specific semantic saying that the function returns a string "that can be parsed as a Sofa parameter".

This approach has several advantages:

As you said Maxime, "In python semantic of str is to provide a concise representation of an object as a string, which is exactly what BaseObject.getLinkPath does. "

A "concise representation" is a very flexible/weak semantic with a lot of different meaning that will change depending on the context. For matthieu (and the Anatoscope fork), a concise representation could be "5.0" For you it seems to be "@/myObject.param"
For me it should be "@/myObject.param=5.0"
For others it could be "Sofa.Data: @/myObject.param=5.0"

These options are more or less adequate depending on the context...but they are all fully valid description because they match the clearly defined and accepted semantic. This is why I think solution 2 is clearly not the way to go because it change this well defined semantic for something narrower in which the str() function should return something that can be parsed as an argument in the createObject function. Doing so is like forcing anyone passing an object to createObject to modify their str() function to make it work with createObject.

I personally think that as the python specification for str() is very weak it is only a way to generate string to be read by human (that can cope with the inherent flexibility of its specification) but I prefer avoiding it to generate string that have to be processed by a program. When I have to generate string that are "parsable" I tend to use specific function with much tighter specification.

@maxime-tournier You are totally right about the recursive aspect...I didn't though about that. My initial worrying was it was transformating a complex structure into a flat one and I wasn't sure it was something that should be done in every cases for any types.

damienmarchal commented 6 years ago

@matthieu-nesme and @maxime-tournier please read my long answer first because I'm really convinced str is not the way to go and that it can generate a lot of trouble and side effect on other person code base.
(Sorry it is hard to to explain this kind of things shortly)

EDIT: Pff I'm exhausted :)

bcarrez commented 6 years ago

Just my 2 cents (after the war) about this __str__ stuff...

I don't know how to explain it clearly, but using the built-in str function to serialize objects in a createObject-specific format sounds weird to me, even if it works in our context. This is the first time we have to convert Sofa objects to string, it does not mean that converting sofa objects to string will only be needed for this specific purpose in the future. I am confident that we can find examples of bad variables usages in the code of Sofa : "this var exists and seems not used by anyone, so I will use it to store my context-specific information without recompiling everything". The wrong tool used for the good result, I don't know if you get me.

In a naive approach I would expect as a end-user, that issuing a print(mySofaStuff) from the sofa GUI built-in python console would return a human-readable string, a summary with the object class name, perhaps the object name, plus some useful infos. Getting either a path or a suite of numbers would be kinda disapointing in this context...

damienmarchal commented 6 years ago

I have restored the recursive version because I see the advantage of handling vector of vector and I have no example of problematic case in mind. I added two tests to validate this behavior.

damienmarchal commented 6 years ago

[ci-build]

damienmarchal commented 6 years ago

Its was tagged "ready" during the meeting & build ok against the current master so I merge It.

maxime-tournier commented 6 years ago

Just so you know, I noticed some code breaking due to people exploiting the fact that python lists (used to) convert to what SVector expect. Now of course this does not work anymore.

The fix is easy (foo = str([...]) instead of foo = [ ... ]).

Is there any technical reason (as opposed to historical) why we're not using SVector-like serialization for everything vector-like?

damienmarchal commented 6 years ago

Thanks for pointing the issue...we should fix it rapidely...

Now your question... The short answer is no.

The long answer, I have never looked at this part of the code but my assumption is that, when SVector was introduced, it would have cost time to fix the whole code base & scenes & offer backward compatibility so SVector was implemented and used in some component without unifying the whole code base. This decision breaks the interface consistency which have very concrete drawback for users (some arrays use one syntax, others use a different syntax and there is no way to know which syntax need to be used) and for developpers as we now have to cope with two interfaces in our re-factoring.

A quick fix, should be to check that the underlying object on which we are passing the array can understand the old array syntax or the python-like syntax.

Fixing all that will take time...this is the problem with technical debt.

maxime-tournier commented 6 years ago

Also, since we use string conversion for just about anything between python and c++, we should really use repr and not str when converting floating point numbers, as shown in the following example:

>>> x = 1.22341234234012483
>>> x
1.2234123423401249
>>> str(x)
'1.22341234234'
>>> repr(x)
'1.2234123423401249'
damienmarchal commented 6 years ago

I'm very puzzled by this SVector vs vector issue. I'm implementing a quick fix in createObject to de-ambiguate the two to preserve at most script compatiblity...but the inconsistancy hurt my eyes and I wonder if it is not time to clean that mess and have an unified syntax for passing parameter arrays.

I think that with PR #322 we could help to make that not too painful. Eg: a scene with

     <APIVersion level='17.12'/>

Would not support the old (sofa style) syntax.

It is yet unclear to me so any help is welcome.

damienmarchal commented 6 years ago

Wouldn't it be possible to add the SVector parsing into the helper::vector by detecting if the first non empty char is a '[' ? This would allow to get rid of svector while preserving it overall functionnality

maxime-tournier commented 6 years ago

@damienmarchal yes, that would be the best.

That and having only a single repr string conversion in createObject should do the trick, and looks like the cleanest way out to me.