Open FrancescAlted opened 2 years ago
Besides not being able to use views in expressions, it can be seen that activating multithreading (e.g. commenting this line out: https://github.com/inaos/iron-array/blob/develop/src/iarray_views.c#L573), can lead to run conditions in other situations, like simple slicing, as the helgrind tool is showing:
$ valgrind --tool=helgrind ./tests slice_type:3_f_ll_v
<skip>
==1261230== ----------------------------------------------------------------
==1261230==
==1261230== Possible data race during read of size 8 at 0x8745C08 by thread #53
==1261230== Locks held: none
==1261230== at 0x805834: get_coffsets (contribs/caterva/contribs/c-blosc2/blosc/frame.c:1035)
==1261230== by 0x807FCF: get_coffset (contribs/caterva/contribs/c-blosc2/blosc/frame.c:1873)
==1261230== by 0x8075DC: frame_get_lazychunk (contribs/caterva/contribs/c-blosc2/blosc/frame.c:2094)
==1261230== by 0x704466: slice_view_postfilter (src/iarray_views.c:237)
==1261230== by 0x711E43: blosc_d (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:1610)
==1261230== by 0x7109DB: _blosc_getitem (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:2899)
==1261230== by 0x712978: blosc2_getitem_ctx (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:2977)
==1261230== by 0x7041DE: type_view_postfilter (src/iarray_views.c:211)
==1261230== by 0x711E43: blosc_d (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:1610)
==1261230== by 0x7157A5: t_blosc_do_job (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:3107)
==1261230== by 0x712DF8: t_blosc (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:3192)
==1261230== by 0x5E2DB1A: ??? (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_helgrind-amd64-linux.so)
==1261230==
==1261230== This conflicts with a previous write of size 8 by thread #54
==1261230== Locks held: none
==1261230== at 0x805AD1: get_coffsets (contribs/caterva/contribs/c-blosc2/blosc/frame.c:1123)
==1261230== by 0x807FCF: get_coffset (contribs/caterva/contribs/c-blosc2/blosc/frame.c:1873)
==1261230== by 0x8075DC: frame_get_lazychunk (contribs/caterva/contribs/c-blosc2/blosc/frame.c:2094)
==1261230== by 0x704466: slice_view_postfilter (src/iarray_views.c:237)
==1261230== by 0x711E43: blosc_d (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:1610)
==1261230== by 0x7109DB: _blosc_getitem (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:2899)
==1261230== by 0x712978: blosc2_getitem_ctx (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:2977)
==1261230== by 0x7041DE: type_view_postfilter (src/iarray_views.c:211)
==1261230== Address 0x8745c08 is 24 bytes inside a block of size 64 alloc'd
==1261230== at 0x5E29E39: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_helgrind-amd64-linux.so)
==1261230== by 0x803656: frame_new (contribs/caterva/contribs/c-blosc2/blosc/frame.c:44)
==1261230== by 0x74781E: blosc2_schunk_new (contribs/caterva/contribs/c-blosc2/blosc/schunk.c:175)
==1261230== by 0x7D474B: caterva_blosc_array_new (contribs/caterva/caterva/caterva.c:195)
==1261230== by 0x7D4C90: caterva_empty (contribs/caterva/caterva/caterva.c:267)
==1261230== by 0x7D5D94: caterva_from_buffer (contribs/caterva/caterva/caterva.c:432)
==1261230== by 0x6D6894: iarray_from_buffer (src/iarray_constructor.c:255)
==1261230== by 0x6C17C3: execute_iarray_slice_type (tests/test_slice_type.c:62)
==1261230== by 0x6BFFE7: __ina_test_slice_type_3_f_ll_v_run (tests/test_slice_type.c:225)
==1261230== by 0x866537: ina_test_run (test.c:689)
==1261230== by 0x84B30B2: (below main) (libc-start.c:308)
==1261230== Block was alloc'd by thread #1
<skip>
==1261230== Possible data race during read of size 8 at 0x876E380 by thread #53
==1261230== Locks held: none
==1261230== at 0x70CA26: read_chunk_header (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:705)
==1261230== by 0x712924: blosc2_getitem_ctx (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:2957)
==1261230== by 0x712897: blosc2_getitem (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:2936)
==1261230== by 0x807FF5: get_coffset (contribs/caterva/contribs/c-blosc2/blosc/frame.c:1880)
==1261230== by 0x8075DC: frame_get_lazychunk (contribs/caterva/contribs/c-blosc2/blosc/frame.c:2094)
==1261230== by 0x704466: slice_view_postfilter (src/iarray_views.c:237)
==1261230== by 0x711E43: blosc_d (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:1610)
==1261230== by 0x7109DB: _blosc_getitem (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:2899)
==1261230== by 0x712978: blosc2_getitem_ctx (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:2977)
==1261230== by 0x7041DE: type_view_postfilter (src/iarray_views.c:211)
==1261230== by 0x711E43: blosc_d (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:1610)
==1261230== by 0x7157A5: t_blosc_do_job (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:3107)
==1261230== Address 0x876e380 is 16 bytes inside a block of size 128 alloc'd
==1261230== at 0x5E27893: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_helgrind-amd64-linux.so)
==1261230== by 0x805966: get_coffsets (contribs/caterva/contribs/c-blosc2/blosc/frame.c:1106)
==1261230== by 0x807FCF: get_coffset (contribs/caterva/contribs/c-blosc2/blosc/frame.c:1873)
==1261230== by 0x8075DC: frame_get_lazychunk (contribs/caterva/contribs/c-blosc2/blosc/frame.c:2094)
==1261230== by 0x704466: slice_view_postfilter (src/iarray_views.c:237)
==1261230== by 0x711E43: blosc_d (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:1610)
==1261230== by 0x7109DB: _blosc_getitem (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:2899)
==1261230== by 0x712978: blosc2_getitem_ctx (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:2977)
==1261230== by 0x7041DE: type_view_postfilter (src/iarray_views.c:211)
==1261230== by 0x711E43: blosc_d (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:1610)
==1261230== by 0x7157A5: t_blosc_do_job (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:3107)
==1261230== by 0x712DF8: t_blosc (contribs/caterva/contribs/c-blosc2/blosc/blosc2.c:3192)
==1261230== Block was alloc'd by thread #54
=
<skip>
Ideally, we should provide a way for being able to call postfilters in parallel without these issues. This can be a major task, but fixing that would be of great benefit to us.
Even with PR #590 , I can still reproduce the freeze on my M1 MacBook Air (but only in that box!):
$ python -m pytest -v
<snip>
iarray/tests/test_reduce.py::test_red_type_view[test_reduce.iarr-False-sum-shape0-chunks0-blocks0-0-float64-uint64] PASSED [ 73%]
iarray/tests/test_reduce.py::test_red_type_view[test_reduce.iarr-False-sum-shape1-chunks1-blocks1-axis1-int64-float64] ^C⏎
/Users/faltet/miniconda3/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
Although it takes a while to freeze (about 5min), this is reproducible and always freezes in the same place.
Since f55390e35bc977a59bdd4e48c81aa54763fc2ab0 helgrind does not complain in the main view tests.
A recent optimization for activating type views to go in parallel (https://github.com/inaos/iron-array/commit/6d2964a0ed9c690428718367b2590e7abeaadf9c) had to be disabled (81a8400) because, even though tests are passing, helgrind is issuing pretty scaring race conditions like:
(and tons of others)
These should be addressed before we can finally unleash all the performance out of views. So far, we will use them in pure single-thread environments.