szaghi / HASTY

HASh Table fortran container exploting coarraY
14 stars 1 forks source link

Coarray exploitation #3

Open szaghi opened 8 years ago

szaghi commented 8 years ago

Exploit coarray to make hash table a massively parallel container is a challenging aim, probably I'll be not up to the task. To start I have to deeply study the work of @MichaelSiehl

MichaelSiehl commented 8 years ago

Hi Stefano, I will update these Github repositories within the next days by adding new code versions using F2008 SYNC MEMORY and atomic subroutines. The current code versions do work with the compilers but are not confoming to the (F2008/15) standard. I can already tell that the Load Balancing Example does work using SYNC MEMORY and atomic subroutines: Since the synchronizations are still coded 'manually', it is possible to synchronize between objects on the same coarray image. Nevertheless, I was unable to achieve something similar using F2015 Events: From my current practical experiences you can use Events only for synchronizations between distinct coarray images. It is important to notice that the use of SYNC MEMORY does form execution segments which are unordered within the example program.

Best regards Michael

szaghi commented 8 years ago

Dear @MichaelSiehl ,

You are very kind, I was not expecting to bother you so early, I have to study with much more care your great work. Anyhow, just because you mentioned the topic, I like to start our corrispondence :smile:

Your approach for obtaining MPMD is foundamental for me to exploit coarray for this generic container. The events of F2015 are not necessary, your sync method should be definitely perfect for my aims.

My main concerns are twofold:

In particular, for this project (that seems a stupid toy, but it is a very foundamental block for others...), I need to obtain a generic container that entails MPMD features. It is passed one half of year from my last view, but I remember that there were issues to encapsulate your MPMD technique into an object, namely building a OOP class entailing your MPMD approach. Is this still an issue, or you have built derived types encapsulating MPMD features?

At some point I would like to talk wiyh all other members of the group (e.g. @rouson and @zbeekman) but for now I have to study. Nevertheless, if you are so kind, I like to bother you even during my coarray-training period, this could be of great help for me.

Cheers.

zbeekman commented 8 years ago

I know one potential limitation---which is almost fixed---is the use of allocatable components of derived type coarrays was not supported by GCC/OpenCoarrays until some of the latest patches to GCC are merged, and the support for this is finalized in OpenCoarrays. In the near future, code such as the following will now be possible using GCC-7 (AKA GCC trunk, until the 7.1 release in the spring) in combination with an OpenCoarrays future release:

type foo
   real, allocatable :: bar(:)
end type
type(foo) :: foobar[*]

It is possible that there are unexercised bugs in OpenCoarrays, but the more people use it and report back to us, the faster we can resolve them. Anyway, @rouson is the real expert here, so my apologies if I've misinterpreted anything.

szaghi commented 8 years ago

@zbeekman I was thinking just this issue that I have tagged weak in the previous post. As I said I need a lot of time to study MPMD technique of @MichaelSiehl , thus for the time I'll be ready I am sure that the implementations (opencoaray/GNU gfortran) supporting caf will be bullet-proof :smile:

MichaelSiehl commented 8 years ago

Hi Izaak, that is great news. I do not use these in my current coarray programming, but did some testing with ifort some time ago. I did post two small test programs utilizing allocatable components of derived type coarrays here: https://groups.google.com/forum/#!topic/opencoarrays/E-hbLfOpD98

Best Regards

2016-10-02 16:12 GMT+02:00 Izaak Beekman notifications@github.com:

I know one potential limitation---which is almost fixed---is the use of allocatable components of derived type coarrays was not supported by GCC/OpenCoarrays until some of the latest patches to GCC are merged, and the support for this is finalized in OpenCoarrays. In the near future, code such as the following will now be possible using GCC-7 (AKA GCC trunk, until the 7.1 release in the spring) in combination with an OpenCoarrays future release:

type foo real, allocatable :: bar(:)end typetype(foo) :: foobar[*]

It is possible that there are unexercised bugs in OpenCoarrays, but the more people use it and report back to us, the faster we can resolve them. Anyway, @rouson https://github.com/rouson is the real expert here, so my apologies if I've misinterpreted anything.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/szaghi/HASTY/issues/3#issuecomment-250973187, or mute the thread https://github.com/notifications/unsubscribe-auth/AQUQ_OEVHDireSpsBhGwgMQ186E62JWnks5qv7u4gaJpZM4KL8mp .

zbeekman commented 8 years ago

@rouson: We may want to incorporate tests like @MichaelSiehl's into OpenCoarrays. @szaghi sorry for hijacking your thread 😄 🔫

szaghi commented 8 years ago

@zbeekman you can do wather you want, you are one of my heroes :smile:

I hope in the near future HASTY will give to OpenCoarrays team a challenging test, but I am not sure I'll be up to the task (at least without your kind help :pray: ).

Cheers.

P.S. @rouson FLAP install script is coming soon.

MichaelSiehl commented 8 years ago

Regarding allocatable components of derived type coarrays, I would also like to point to the following code snippet from my second test program (using ifort):

IF (Coarray_Object[intImageNumber] % logAllocationStatus) THEN intLowerBound = LBOUND (Coarray_Object[intImageNumber] % reaDataArray, 1) intUpperBound = UBOUND (Coarray_Object[intImageNumber] % reaDataArray, 1) ELSE intLowerBound = 1 intUpperBound = 0 END IF

As far as I remember, ifort did allow to use the LBOUND and UBOUND intrinsics to request the array bounds (of such non-symetric coarrays) from a remote image.

2016-10-03 23:13 GMT+02:00 Stefano Zaghi notifications@github.com:

@zbeekman https://github.com/zbeekman you can do wather you want, you are one of my heroes 😄

I hope in the near future HASTY will give to OpenCoarrays team a challenging test, but I am not sure I'll be up to the task (at least without your kind help 🙏 ).

Cheers.

P.S. @rouson https://github.com/rouson FLAP install script is coming soon.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/szaghi/HASTY/issues/3#issuecomment-251228855, or mute the thread https://github.com/notifications/unsubscribe-auth/AQUQ_JZSrzgSQpQyh441agLbCe-jmoiEks5qwW_xgaJpZM4KL8mp .

MichaelSiehl commented 8 years ago

|My main concerns are twofold:

|In particular, for this project (that seems a stupid toy, but it is a very foundamental block for others...), I need to obtain a generic container that entails MPMD features. |It is passed one half of year from my last view, but I remember that there were issues to encapsulate your MPMD technique into an object, namely building a OOP |class entailing your MPMD approach. Is this still an issue, or you have built derived types encapsulating MPMD features?

|At some point I would like to talk wiyh all other members of the group (e.g. @rouson https://github.com/rouson and @zbeekman https://github.com/zbeekman) but for now I have to study. Nevertheless, if you are so kind, I |like to bother you even during my coarray-training period, this could be of great help for me. Hi Stefano, it is important to note that the F2008 MPMD-style is still an experiment. My current main focus is on atomic subroutines and SYNC MEMORY (and its limitatons and possible pitfalls). Upcomming F2015 Teams may be a more convenient way to deploy MPMD-style programming techniques. The F2008 techniques I show do require much more framework development for safe use in real-world parallel programming. With the use of SYNC MEMORY I do enter the world of unordered execution segments: I may use Fortran 2008 means to stear the parallel program execution in a conforming way (using logical and integer scalars), but, as far as my current understanding goes, all further data transfer (array data, character, real, etc.) can't be done in a standard conforming way only with Fortran 2008 means. Thus, it is important to note that we are just entering the world of MPMD-style parallel programming using coarrays with the current F2008 standard. The main advantage of my current approach is the ability to maintain a sequential-like syntax within my parallel programming. Regarding OOP: I believe the real power of coarrays lies in the ability to use them in conjunction with objects (derived type coarrays). But at the same time, coarrays are very limited for use with polymorphism and inheritence (the main means for code reuse with OOP). Another means for code reuse is delegation, which I also do not use with my F95-style coarray object wrappers. Finally, the only code reuse technique I use with my coarray wrappers is a (primitive) code generation technique we've developed in the 1990s for sequential programming (it does not work perfectly with the coarray wrappers yet). I hope this helps a little bit. cheers ;-)

MichaelSiehl commented 7 years ago

I just created a new GitHub repository containing the above example program (to illustrate the use of a derived type coarray with an allocatable component). Feel free to use it or change it for your needs. While my current version of OpenCoarrays/gfortran does not support the syntax of the example code, it does perfectly point to those language elements it does not support yet. https://github.com/MichaelSiehl/Coarray-with-Allocatable-Component-Example

2016-10-05 1:22 GMT+02:00 michael siehl miesiehl@gmail.com:

|My main concerns are twofold:

  • |my very basic knowldge of coarray (the big issue);
  • |the possible immaturity of coarray implementaions of compilers (weak issue).

|In particular, for this project (that seems a stupid toy, but it is a very foundamental block for others...), I need to obtain a generic container that entails MPMD features. |It is passed one half of year from my last view, but I remember that there were issues to encapsulate your MPMD technique into an object, namely building a OOP |class entailing your MPMD approach. Is this still an issue, or you have built derived types encapsulating MPMD features?

|At some point I would like to talk wiyh all other members of the group (e.g. @rouson https://github.com/rouson and @zbeekman https://github.com/zbeekman) but for now I have to study. Nevertheless, if you are so kind, I |like to bother you even during my coarray-training period, this could be of great help for me. Hi Stefano, it is important to note that the F2008 MPMD-style is still an experiment. My current main focus is on atomic subroutines and SYNC MEMORY (and its limitatons and possible pitfalls). Upcomming F2015 Teams may be a more convenient way to deploy MPMD-style programming techniques. The F2008 techniques I show do require much more framework development for safe use in real-world parallel programming. With the use of SYNC MEMORY I do enter the world of unordered execution segments: I may use Fortran 2008 means to stear the parallel program execution in a conforming way (using logical and integer scalars), but, as far as my current understanding goes, all further data transfer (array data, character, real, etc.) can't be done in a standard conforming way only with Fortran 2008 means. Thus, it is important to note that we are just entering the world of MPMD-style parallel programming using coarrays with the current F2008 standard. The main advantage of my current approach is the ability to maintain a sequential-like syntax within my parallel programming. Regarding OOP: I believe the real power of coarrays lies in the ability to use them in conjunction with objects (derived type coarrays). But at the same time, coarrays are very limited for use with polymorphism and inheritence (the main means for code reuse with OOP). Another means for code reuse is delegation, which I also do not use with my F95-style coarray object wrappers. Finally, the only code reuse technique I use with my coarray wrappers is a (primitive) code generation technique we've developed in the 1990s for sequential programming (it does not work perfectly with the coarray wrappers yet). I hope this helps a little bit. cheers ;-)

zbeekman commented 7 years ago

@MichaelSiehl would you be willing to submit this as a test case to OpenCoarrays, to test our implementation once it goes live?

CC: @rouson

MichaelSiehl commented 7 years ago

@zbeekman: Yes, of course. The code compiles and runs well with ifort. Does submitting this as a test case to OpenCoarrays require any further steps by myself or can you do it yourself? cheers

zbeekman commented 7 years ago

I think the one thing that should probably happen is that you open a pull request to trigger the CLA (Contributor License Agreement) assistant, just so that you acknowledge that you're contributing the code, rather than us stealing it. Let's hold off for now, I have a trip coming up this week, but I can help walk you through it when I return next week.

On Fri, Oct 14, 2016 at 4:49 PM Michael Siehl notifications@github.com wrote:

@zbeekman https://github.com/zbeekman: Yes, of course. The code compiles and runs well with ifort. Does submitting this as a test case to OpenCoarrays require any further steps by myself or can you do it yourself? cheers

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/szaghi/HASTY/issues/3#issuecomment-253915665, or mute the thread https://github.com/notifications/unsubscribe-auth/AAREPCBCGZ4qZelmU5MwIpRbJ77mTbBNks5qz-rlgaJpZM4KL8mp .

szaghi commented 7 years ago

@MichaelSiehl Great! Thank you very much!

@zbeekman Enjoy your trip :smile:

szaghi commented 7 years ago

@MichaelSiehl @zbeekman @rouson (and every-else having coarray experience).

Dear all,

I have started my experiment with CAF on this project. There is something that I do not understand, please find in the following some questions for you:

type :: baz
  integer, pointer :: mic(:)=>null()
  contains
    procedure :: add
end type baz

type :: foo
  type(baz), allocatable :: bar(:)[:]
end type foo
...
! somewhere in the rainbow
allocate(foo%bar(99)[*])
call foo%bar(32)%add(923)

My guess about the last point is that I am missing some synchronization steps. Actually, I synchronize images only before destroying the tables, maybe I need synchronization also before using it. However, I missing why. If you consider the pseudo code above,

do I need (conceptually) a further syncrhonization between the allocate(foo...) and call foo%bar(32)...?

My implied idea was that an allocation of CAF implies a synchronization, thus I did not further ones. Consider that add is a simple TBP of type baz, but baz contains pointer members.

Thank you in advance for any help!

Cheers.

HASTY failing test report

Compilation
stefano@zaghi(04:58 PM Mon Nov 14) on feature/add-coarray-buckets [!]
~/fortran/HASTY 13 files, 320Kb
→ FoBiS.py build -mode tests-intel-caf-debug
Builder options
  Directories
    Building directory: "exe"
    Compiled-objects .o   directory: "exe/obj"
    Compiled-objects .mod directory: "exe/mod"
  Compiler options
    Vendor: "intel"
    Compiler command: "ifort"
    Module directory switch: "-module"
    Compiling flags: "-cpp -c -assume realloc_lhs -O0 -debug all -check all -warn all -extend-source 132 -traceback -gen-interfaces#-fpe-all=0 -fp-stack-check -fstack-protector-all -ftrapuv -no-ftz -std08 -coarray -DCAF"
    Linking flags: "-O0 -debug all -check all -warn all -extend-source 132 -traceback -gen-interfaces#-fpe-all=0 -fp-stack-check -fstack-protector-all -ftrapuv -no-ftz -std08 -coarray"
    Preprocessing flags: "-DCAF"
    Coverage: False
    Profile: False
  PreForM.py used: False
  PreForM.py output directory: None
  PreForM.py extensions processed: []

Building src/tests/hasty_test_dictionary.f90
Compiling src/third_party/PENF/src/lib/penf_global_parameters_variables.F90 serially
Compiling src/third_party/PENF/src/lib/penf_b_size.F90 serially
Compiling src/third_party/PENF/src/lib/penf_stringify.F90 serially
Compiling src/third_party/PENF/src/lib/penf.F90 serially
Compiling src/lib/hasty_key_base.f90 serially
Compiling src/lib/hasty_content_adt.f90 serially
Compiling src/lib/hasty_dictionary_node.f90 serially
Compiling src/lib/hasty_dictionary.f90 serially
Compiling src/lib/hasty_key_morton.f90 src/lib/hasty_hash_table.f90 using 2 concurrent processes
Compiling src/lib/hasty.f90 src/third_party/fortran_tester/src/tester.f90 using 2 concurrent processes
Compiling src/tests/hasty_test_dictionary.f90 serially
src/tests/hasty_test_dictionary.f90(93): remark #7712: This variable has not been used.   [KEY]
  subroutine iterator_max(key, content, done)
--------------------------^

Linking exe/hasty_test_dictionary
Target src/tests/hasty_test_dictionary.f90 has been successfully built
Builder options
  Directories
    Building directory: "exe"
    Compiled-objects .o   directory: "exe/obj"
    Compiled-objects .mod directory: "exe/mod"
  Compiler options
    Vendor: "intel"
    Compiler command: "ifort"
    Module directory switch: "-module"
    Compiling flags: "-cpp -c -assume realloc_lhs -O0 -debug all -check all -warn all -extend-source 132 -traceback -gen-interfaces#-fpe-all=0 -fp-stack-check -fstack-protector-all -ftrapuv -no-ftz -std08 -coarray -DCAF"
    Linking flags: "-O0 -debug all -check all -warn all -extend-source 132 -traceback -gen-interfaces#-fpe-all=0 -fp-stack-check -fstack-protector-all -ftrapuv -no-ftz -std08 -coarray"
    Preprocessing flags: "-DCAF"
    Coverage: False
    Profile: False
  PreForM.py used: False
  PreForM.py output directory: None
  PreForM.py extensions processed: []

Building src/tests/hasty_test_hash_table.f90
Compiling src/tests/hasty_test_hash_table.f90 serially
src/tests/hasty_test_hash_table.f90(72): remark #7712: This variable has not been used.   [KEY]
  subroutine iterator_max(key, content, done)
--------------------------^

Linking exe/hasty_test_hash_table
Target src/tests/hasty_test_hash_table.f90 has been successfully built
Builder options
  Directories
    Building directory: "exe"
    Compiled-objects .o   directory: "exe/obj"
    Compiled-objects .mod directory: "exe/mod"
  Compiler options
    Vendor: "intel"
    Compiler command: "ifort"
    Module directory switch: "-module"
    Compiling flags: "-cpp -c -assume realloc_lhs -O0 -debug all -check all -warn all -extend-source 132 -traceback -gen-interfaces#-fpe-all=0 -fp-stack-check -fstack-protector-all -ftrapuv -no-ftz -std08 -coarray -DCAF"
    Linking flags: "-O0 -debug all -check all -warn all -extend-source 132 -traceback -gen-interfaces#-fpe-all=0 -fp-stack-check -fstack-protector-all -ftrapuv -no-ftz -std08 -coarray"
    Preprocessing flags: "-DCAF"
    Coverage: False
    Profile: False
  PreForM.py used: False
  PreForM.py output directory: None
  PreForM.py extensions processed: []

Building src/tests/hasty_test_hash_table_homo.f90
Compiling src/tests/hasty_test_hash_table_homo.f90 serially
Linking exe/hasty_test_hash_table_homo
Target src/tests/hasty_test_hash_table_homo.f90 has been successfully built
Builder options
  Directories
    Building directory: "exe"
    Compiled-objects .o   directory: "exe/obj"
    Compiled-objects .mod directory: "exe/mod"
  Compiler options
    Vendor: "intel"
    Compiler command: "ifort"
    Module directory switch: "-module"
    Compiling flags: "-cpp -c -assume realloc_lhs -O0 -debug all -check all -warn all -extend-source 132 -traceback -gen-interfaces#-fpe-all=0 -fp-stack-check -fstack-protector-all -ftrapuv -no-ftz -std08 -coarray -DCAF"
    Linking flags: "-O0 -debug all -check all -warn all -extend-source 132 -traceback -gen-interfaces#-fpe-all=0 -fp-stack-check -fstack-protector-all -ftrapuv -no-ftz -std08 -coarray"
    Preprocessing flags: "-DCAF"
    Coverage: False
    Profile: False
  PreForM.py used: False
  PreForM.py output directory: None
  PreForM.py extensions processed: []

Building src/tests/hasty_test_hash_table_homokey_failure.f90
Compiling src/tests/hasty_test_hash_table_homokey_failure.f90 serially
Linking exe/hasty_test_hash_table_homokey_failure
Target src/tests/hasty_test_hash_table_homokey_failure.f90 has been successfully built
Builder options
  Directories
    Building directory: "exe"
    Compiled-objects .o   directory: "exe/obj"
    Compiled-objects .mod directory: "exe/mod"
  Compiler options
    Vendor: "intel"
    Compiler command: "ifort"
    Module directory switch: "-module"
    Compiling flags: "-cpp -c -assume realloc_lhs -O0 -debug all -check all -warn all -extend-source 132 -traceback -gen-interfaces#-fpe-all=0 -fp-stack-check -fstack-protector-all -ftrapuv -no-ftz -std08 -coarray -DCAF"
    Linking flags: "-O0 -debug all -check all -warn all -extend-source 132 -traceback -gen-interfaces#-fpe-all=0 -fp-stack-check -fstack-protector-all -ftrapuv -no-ftz -std08 -coarray"
    Preprocessing flags: "-DCAF"
    Coverage: False
    Profile: False
  PreForM.py used: False
  PreForM.py output directory: None
  PreForM.py extensions processed: []

Building src/tests/hasty_test_hash_table_homocontent_failure.f90
Compiling src/tests/hasty_test_hash_table_homocontent_failure.f90 serially
Linking exe/hasty_test_hash_table_homocontent_failure
Target src/tests/hasty_test_hash_table_homocontent_failure.f90 has been successfully built
Execution error
stefano@zaghi(05:17 PM Mon Nov 14) on feature/add-coarray-buckets
~/fortran/HASTY 14 files, 324Kb
→ export FOR_COARRAY_NUM_IMAGES=2

stefano@zaghi(05:30 PM Mon Nov 14) on feature/add-coarray-buckets
~/fortran/HASTY 14 files, 324Kb
→ ./exe/hasty_test_hash_table
forrtl: severe (174): SIGSEGV, segmentation fault occurred
In coarray image 1
Image              PC                Routine            Line        Source
hasty_test_hash_t  00000000004F7021  Unknown               Unknown  Unknown
hasty_test_hash_t  00000000004F515B  Unknown               Unknown  Unknown
hasty_test_hash_t  00000000004A5B84  Unknown               Unknown  Unknown
hasty_test_hash_t  00000000004A5996  Unknown               Unknown  Unknown
hasty_test_hash_t  0000000000467EB9  Unknown               Unknown  Unknown
hasty_test_hash_t  000000000046C8A6  Unknown               Unknown  Unknown
libpthread-2.24.s  00007FF2913A2080  Unknown               Unknown  Unknown
hasty_test_hash_t  0000000000443A4D  hasty_hash_table_         112  hasty_hash_table.f90
hasty_test_hash_t  0000000000460BA2  hasty_test_hash_t          67  hasty_test_hash_table.f90
hasty_test_hash_t  000000000045DD6B  MAIN__                     23  hasty_test_hash_table.f90
hasty_test_hash_t  0000000000403DAE  Unknown               Unknown  Unknown
libc-2.24.so       00007FF290E0F291  __libc_start_main     Unknown  Unknown
hasty_test_hash_t  0000000000403CAA  Unknown               Unknown  Unknown

application called MPI_Abort(comm=0x84000004, 3) - process 0
forrtl: severe (174): SIGSEGV, segmentation fault occurred
In coarray image 2
Image              PC                Routine            Line        Source
hasty_test_hash_t  00000000004F7021  Unknown               Unknown  Unknown
hasty_test_hash_t  00000000004F515B  Unknown               Unknown  Unknown
hasty_test_hash_t  00000000004A5B84  Unknown               Unknown  Unknown
hasty_test_hash_t  00000000004A5996  Unknown               Unknown  Unknown
hasty_test_hash_t  0000000000467EB9  Unknown               Unknown  Unknown
hasty_test_hash_t  000000000046C8A6  Unknown               Unknown  Unknown
libpthread-2.24.s  00007F672A75E080  Unknown               Unknown  Unknown
hasty_test_hash_t  0000000000443A4D  hasty_hash_table_         112  hasty_hash_table.f90
hasty_test_hash_t  0000000000460BA2  hasty_test_hash_t          67  hasty_test_hash_table.f90
hasty_test_hash_t  000000000045DD6B  MAIN__                     23  hasty_test_hash_table.f90
hasty_test_hash_t  0000000000403DAE  Unknown               Unknown  Unknown
libc-2.24.so       00007F672A1CB291  __libc_start_main     Unknown  Unknown
hasty_test_hash_t  0000000000403CAA  Unknown               Unknown  Unknown

application called MPI_Abort(comm=0x84000002, 3) - process 1
szaghi commented 7 years ago

Dear all,

( @rouson @zbeekman @MichaelSiehl )

I am out of office, but I have done a test on my tablet: with OpenCoarrays 1.7.5 (manually compiled), GNU gfortran 6.2.0 (from dev ubuntu repo), MPICH 3.2 (manually compiled) the above test works well :tada:

The problem is with Intel compiler :cry: the code seems to be standard conforming. For who is interested there is an official bug report on intel support here

As a consequence, for HASTY I have to stick on only OpenCoarrays/GNU gfortran until this Intel's bug will be solved. This is a problem for me, because using as much as possible many compilers concurrently is my preferred workflow to likely capture my errors... I some you have access to other compilers, e.g. IBM, Cray, NAG... and will be so kind to occasionally do a compilation-on-the-fly I will be very happy :smile:

rouson commented 7 years ago

@szaghi, I have access to a Cray and would be glad to test for you. As I'm sure you know, I work best interactively so I'd suggest occasionally setting up a time for us to talk so that you can walk me through running your tests. Alternatively, it would be great if you would contribute your bug reports to the AdHoc repository that I set up to track bug reports related to modern Fortran. It's nice to have everything in one place and run all the tests interactively.

I will also make sure an Intel compiler support engineer with whom I've interacted quite often is aware of this bug. I'm certain Steve will do a great job tracking it, but he has announced his retirement so it might help to have another engineer watching it also.

http://rouson.youcanbook.me

szaghi commented 7 years ago

@rouson

Dear Damian, you are very kind! When I'll have meaningful tests all bother you. I always assumed that Cray is a really great compiler (as IBM/INTEL/GNU). One "obscure object" for me is NAG: do you have ever used it?

For the Intel's bug, I am very happy to contribute to your (great) AdHoc. Later toady I'll try to create a PR.

Cheers.

szaghi commented 7 years ago

@rouson

Dear Damian, I just created a PR for AdHoc with the intel issue. Steve as raised it as an official issue, but not yet as an official bug.

I hope my PR conforms to your conventions, if it does not feel free to reject it, I'll amend the PR with your corrections.

Cheers.

rouson commented 7 years ago

@szaghi

Thanks for contributing the PR to AdHoc.

Regarding compilers, I view GNU as in the in lead right now for Fortran 2015 support. It's only missing two major 2015 features: parameterized derived types (which happens to be a 2003 feature) and teams.

NAG is generally considered the best compiler for checking standards-conformance. For that reason, it would be great if NAG were the most common second compiler -- sort of like how English is the most common second language. As of the last time I checked, NAG was missing only one feature to reach 2003 compliance (user-defined derived type input/output) and hadn't yet started on any of the bigger features required for 2008 or 2015 compliance.

Cray is a great compiler and is fully 2008-compliant and a sizable amount of 2015, but not quite as much as GNU.

For my purposes, there are only three useful compilers: Cray, Intel, and GNU. Almost everything I do these days involves CAF and those are the only three compilers that support CAF. However, even in what I just wrote, I'm being overly generous because Fortran 2015 events are very important to parallel performance with CAF and only GNU supports events. Once GNU became the first compiler to support Fortran 2015 events, I decided to fully embrace Fortran 2015 and stop waiting for other compilers to catch up. In the multi-core/many-core era, it doesn't make much sense to talk about serial performance and GNU is generally in the lead for parallel performance -- especially if one fully embraces all available Fortran 2015 features.

szaghi commented 7 years ago

@rouson

Thanks for contributing the PR to AdHoc.

You are much more than welcome.

NAG is generally considered the best compiler for checking standards-conformance. For that reason, it would be great if NAG were the most common second compiler -- sort of like how English is the most common second language. As of the last time I checked, NAG was missing only one feature to reach 2003 compliance (user-defined derived type input/output) and hadn't yet started on any of the bigger features required for 2008 or 2015 compliance.

This is what I read elsewhere, NAG is one of the best "debugging-compiler".

For my purposes, there are only three useful compilers: Cray, Intel, and GNU. Almost everything I do these days involves CAF and those are the only three compilers that support CAF. However, even in what I just wrote, I'm being overly generous because Fortran 2015 events are very important to parallel performance with CAF and only GNU supports events. Once GNU became the first compiler to support Fortran 2015 events, I decided to fully embrace Fortran 2015 and stop waiting for other compilers to catch up.

As I said, I am trying to moving my bosses from MPI to CAF, this is a complex goal... When I joined CNR-INSEAN my first goal was move them from static allocation to dynamic one. Then I showed that OOP could help us and to adopt it, I started in 2010... I win now :cry: I hope the CAF-mission will be less than 6 years... (OT do you think that Fanfarillo or Filippone could accept an invited lecture at INSEAN about CAF?)

I rely on GNU for much all of my stuff, I use it also for research simulations, but for the few commercial works that we do, we prefer to use Intel: the code we use for commercial works has been initially developed with PGI, then, when I moved my bosses to dynamic allocation, I convinced them to buy Intel thus they now are more confident with Intel than GNU (my bad :cry: ).

In the multi-core/many-core era, it doesn't make much sense to talk about serial performance and GNU is generally in the lead for parallel performance -- especially if one fully embraces all available Fortran 2015 features.

Sante Parole (English: Saint Words): I am still discussing on google on how much goto is faster/clear than select case...

My best regards.

szaghi commented 7 years ago

@rouson @zbeekman @MichaelSiehl @afanfa @LadaF @jeffhammond @certik @victorsndvg @milancurcic @muellermichel @jacobwilliams @cmacmackin

Dear all,

I am sorry to ping/bother you directly, feel free to ignore this help request without further comments.

The baseline serial hash-table structure is now ready and my preliminary tests with CAF are really interesting (although I had to switch off Intel in flavor of only GNU). I am now at point where I need the help of more expert parallel programmers like you. A very brief description:

My experience with parallel hash-table is nearly zero... I studied many resources, but most of them are referred to distributed or lock-free hash-table designed for the cloud (mostly peer-peer torrent-like aims) where the main need is to ensure to avoid to re-hash/re-distribute all nodes when a bucket is added/removed (mainly related to consistent hashing) by means of circular mappings. My needs are different:

  1. I need to distribute the load (possibly preserving locality of nodes/blocks) over all CAF images;
  2. the occurrence of the re-hashing of the keys due to buckets number changes (in the eventuality that the hash-table saturation overcomes 80%) is very low, thus I do not need consistent hashing.

In general, I know a priori the maximum number of grid-refinement levels allowed, thus I know a priori the maximum dimension of the hash-table, so I can select the buckets number consistently, thus re-hashing is a minor necessity. My main concern is

how to distribute the nodes over the CAF images

In my current test all images have the same number of buckets, thus the hashed-keys map on the same bucket on all images...., meaning the same identical hash-table copied over all images.

Probably I need some sort of offset to (uniquely) distribute the nodes over the CAF images, but I have no idea of which kind of offset and the resulting hash function arising from this offset. For example I can imagine something like:

Are some of you aware of something similar or can you give me a better idea?

Thank you in advance for any hints!

Cheers.

jeffhammond commented 7 years ago

although I had to switch off Intel in flavor of only GNU

Why?

szaghi commented 7 years ago

@jeffhammond

I Jeff, the reason is a small Intel compiler's issue, see this It is still not an official bug, but Steve Lionel raised it as an official issue of the current Intel version. Anyhow, this issue prevents me to call TBP of CAF members, thus all HASTY structure results not viable. Indeed, relying on only GNU is a concern for me: my "bosses" want Intel for production, but, more importantly, I am somehow OCD and testing my bad-codes with at least 2 different compilers alleviate my irritability... the chances to catch my errors is reduced by 50%...

jeffhammond commented 7 years ago

I Jeff, the reason is a small Intel compiler's issue, see this It is still not an official bug, but Steve Lionel raised it as an official issue of the current Intel version. Anyhow, this issue prevents me to call TBP of CAF members, thus all HASTY structure results not viable. Indeed, relying on only GNU is a concern for me: my "bosses" want Intel for production, but, more importantly, I am somehow OCD and testing my bad-codes with at least 2 different compilers alleviate my irritability... the change to catch my errors is reduced by 50%...

You want to try Cray compiler? I can help you with NERSC access.

szaghi commented 7 years ago

@jeffhammond

Jeff, you are too much kind, thank you very much! In the future when I have some meaningful tests I'll like to prepare some tests for you and Damian that have access to Cray, but now it is premature, I do not want to waste your time. Your help in the form of comments/idea/critics here is already an God-given for me!

Cheers.

zbeekman commented 7 years ago

@szaghi Have you looked into SAMRAI or something else out there already? (I can't remember what OVERFLOW uses, I can look in my notes later this week... but you might consider using something out there already before reinventing the wheel....)

szaghi commented 7 years ago

@zbeekman

I looked at SAMRAI, BoxLib, Paramesh and many others. You are right, it is not good to reinvent the wheel, but in this case I prefer... The AMR data structure is a crucial key for my work, I want it all in my hands and, probably a more important point, I want to learn and only to use an AMR-library. Finally, to my knowledge, no ones are CAF based :smile:

HASTY is a small piece of my tool for my personal research.

Concerning the parallel hashing, I have done some little, but important steps today: hopefully Monday I'll ask for your comments.

Cheers.