populse / capsul

Collaborative Analysis Platform : Simple, Unifying, Lean
Other
7 stars 14 forks source link

Attributes of fields defined at the class level should not be shared between instances #325

Closed ylep closed 7 months ago

ylep commented 8 months ago

I am having trouble with the Capsul tests for highres-cortex:

The problem is that when I call highres_cortex.capsul.processes.Laplacian after is has been run as part of a pipeline, its fields retain the generate_temporary attribute!

I am using class-level attributes and not instance attributes, because I thought that it was more in line with the philosophy behind traits / pydantic typing.

denisri commented 8 months ago

I'm not sure if it is normal or not (I'd say not...), however in tests we generally try to use temporary/local databases. For instance in capsul.test.test_fake_morphologist, we use:

    def setUp(self):
        self.tmp = tmp = Path(tempfile.mkdtemp(prefix="capsul_test_fakem"))
        # [...]
        self.capsul = Capsul(
            "test_fake_morphologist",
            site_file=self.config_file,
            user_file=None,
            database_path=osp.join(self.tmp, "capsul_engine_database.rdb"),
        )

The reason for this was primarily not to foul the persistent user database, but it also avoids concurrent accesses. Would it improve anything for you ? Anyway your problem seems to point out that concurrent accesses are not safe...

ylep commented 8 months ago

Thanks @denisri for the tip! I have made the change to use a temporary local database as you suggested. Unfortunately, this did not fix the original issue.

I noticed that it is always the same two tests that are consistently failing (test_laplacian and test_upwind_euclidean). I will triple-check that the issue is not on my side...

By the way, since using a temporary database I am getting messages printed to my terminal asynchronously (server has probably shutdown. Exiting.). This is probably harmless, but is a bit annoying.

ylep commented 8 months ago

I made some progress and characterized the bug a bit more precisely: it happens when certain processes are run in a certain order. For instance, running highres_cortex.capsul.processes.Laplacian after highres_cortex.capsul.isovolume fails systematically:

KEEP_TEMPORARY=1 python -m highres_cortex.test.test_capsul SphereTestCase.test_equivolumetric_pipeline SphereTestCase.test_laplacian --verbose

It behaves as if the highres_cortex.capsul.processes.Laplacian process worked correctly the first time, but somehow got broken after being used in the pipeline (which makes use of it internally). Still investsigating...

ylep commented 8 months ago

Okay, here is the issue: even though I set the output filename (laplace_field) in the code that calls Laplacian:

        p = capsul.api.executable(
            "highres_cortex.capsul.processes.Laplacian")
        p.classif = os.path.join(self.test_dir, "classif.nii.gz")
        p.precision = 0.001
        p.typical_cortical_thickness = 3
        p.laplace_field = os.path.join(self.test_dir, "laplacian.nii.gz")   # <-- HERE
        print(p.laplace_field)
        with self.capsul_engine as ce:
            ce.run(p, print_report=True, debug=True)

it is replaced by Capsul with an auto-generated temporary filename:

job uuid: 5afd0e31-673d-49bb-ba10-35cf13303096
process: highres_cortex.capsul.processes.Laplacian
pipeline node: 
status: done
return code: 0
start time: 2023-11-14 11:54:22.979192
end time: 2023-11-14 11:54:23.938594
disabled: False
wait for: ['ab0aa528-d0a1-4e47-a02c-8dc3102de597']
waited_by: []
input parameters:
{'classif': '/tmp/highres-cortex-capsul-teststcf7p58_/classif.nii.gz',
 'laplace_field': '!{dataset.tmp.path}/Laplacian.laplace_field_2b39153a-4cec-44af-a58e-c39c952670d9.nii.gz',   <-- HERE
 'precision': 0.001,
 'typical_cortical_thickness': 3,
 'verbosity': 1}
output parameters: none

This happens if the pipeline (highres_cortex.capsul.isovolume) has been run first, not in a fresh client.

I will now inspect the Process class to see if it differs between the working and the non-working case.

denisri commented 8 months ago

I did not take time to look at your pipelines yet, but this definitely looks like a bug in Capsul. Is the parameter which has a problem linked to a sub-process which is instantiated in both pipelines ? (I wonder if there is a kind of class-level field which would erroneously share a value between instances, or something similar)

ylep commented 8 months ago

Found it! And yes @denisri you guessed it right: the problem is that when I call highres_cortex.capsul.processes.Laplacian after is has been used in the pipeline, its fields retain the generate_temporary attribute!

I am using class-level attributes and not instance attributes, because I thought that it was more in line with the philosophy behind traits / pydantic typing.

https://github.com/neurospin/highres-cortex/blob/b9b76f008c4b98a828a4267861498c790b13492b/python/highres_cortex/capsul/processes.py#L51-L71

Isn't there a way that we could store instance-level attributes? Maybe the typing fields are not the right place to store such attributes...

denisri commented 8 months ago

OK, then at least we understand. Actually we use class-level fields as if they were instance fields, just as a more convenient way to declare them, but currently it's wrong. In Capsul v2, class traits were in fine duplicates as instance traits, so we did not run into this problem (class-level traits thus were not possible). In Capsul v3 we have probably not taken care of this, and fields are not duplicated in instances, thus do not behave the same. So we have to decide which behaviour we want to implement (assuming it's possible to automatically duplicate class fields on instances). Do we want to actually have class-level fields (more "pythoninc"), or always have instance fields (as in Capsul v2) ?

Isn't there a way that we could store instance-level attributes? Maybe the typing fields are not the right place to store such attributes...

Maybe not, it's just convenient and easy to do it this way. Another way would probably require some additional development.

sapetnioc commented 8 months ago

In v3, metadata are obtained from the Field instances and for class fields they are stored in class and therefore shared with all instances. It would not be easy to change the API to get metadata from controller instance since metadata are used in many places and can be read/modified by direct attribute access and there is no obvious method to find in code.

So, one option would be to copy all class fields into Process instances during construction. I have the feeling that it may use too much resource for big pipelines but I am not absolutely sure.

Another option could be to make class fields read-only (to avoid users to have the same kind of problem one day), to allow explicit copy of a class fields into an instance field to make it read-write and to store generate_temporary values in a dict in executable instead of in each field. I do not know if there are other internal metadata that are used that way but making class field read-only would raise an exception.

denisri commented 8 months ago

The copy to instance mechanism can be a solution, and could be done on demand, only for fields for which we want to change metadata. So yes why not. We would need an easy copy function, like Controller.make_instance_field(field_name) or something.

sapetnioc commented 7 months ago

This issue should be fixed by two PR, https://github.com/populse/soma-base/pull/53 and #326. Controller class fields are now read-only and process instantiation copy them to instance field. It was too difficult to make something smarter by copying fields only when they need to be modified.

ylep commented 7 months ago

I confirm that the original issue is fixed for me, thanks!