jchelly / gadgetviewer

Simple tool for interactive visualisation of Gadget and SWIFT N-body simulations
GNU General Public License v3.0
24 stars 9 forks source link

using gadgetviewer for a very large simulation snapshot #5

Closed weiguangcui closed 3 years ago

weiguangcui commented 5 years ago

Dear,

I am trying to use Gadgetviewer for a very large simulation snapshot (longIDs, HDF5 format, single file). I compiled gadgetviewer with --enable-big-snapshots. But it still can not load the snapshot. I think this is caused by the snapshot is written in a single file. I also have tried to set the #define SIZEOF_INT 8 in the config.h, but that does not help. I got stuck at the transferring the long long dataset in hdf5/src/read_hdf5.f90 when it calls the read_hdf5_c.c functions. The code does not allow to compile because of an error:

read_hdf5.f90(621): error #6683: A kind type parameter must be a compile-time constant.   [C_LONG_LONG]
    integer(kind=C_LONG_LONG), dimension(7)        :: c_count

Any suggestions will be welcome.

Thanks a lot!

jchelly commented 5 years ago

Thanks for letting me know about this. I think you have the right idea for fixing it. The problem must be the types used for dataset sizes and particle indexes. How many particles are in your snapshot file?

NumPart_ThisFile in a Gadget HDF5 snapshot is normally a 4 byte integer so I never considered the possibility of larger snapshots when I wrote the HDF5 read routine. But since SWIFT always writes single file snapshots I think the code will need to be able to deal with larger numbers of particles per file now. I'll take a look at it.

The read_hdf5 routines will need to be modified to use 8 byte integers for dataset sizes and there will need to be similar changes in gadget_hdf5_reader.F90.

jchelly commented 5 years ago

You might be able to work around the undefined C_LONG_LONG error by adding something like this at the start of the function or module:

use iso_c_binding, only : C_LONG_LONG

I would not modify SIZEOF_INT because it doesn't actually change the size of ints - it just tells the code how big they are, and if it's incorrect various things will break.

weiguangcui commented 5 years ago

Thank you for this quick reply. Adding use iso_c_binding, only : C_LONG_LONG fixes that issue. But when I want to have both C_INT and C_LONG_LONG declared in the head of gadget_hdf5_reader.F90. It gives another problem:

read_hdf5.f90(3): error #6405: The same named entity from different modules and/or program units cannot be referenced.   [C_INT]
  USE iso_c_binding, ONLY: C_INT, C_LONG_LONG

I think this related to the use f90_util. Any idea of how to get both ctypes included? It would be a waste of memory to change everything into C_LONG_LONG Thanks again.

jchelly commented 5 years ago

When I wrote gadgetviewer the iso_c_binding module wasn't available so the f90_util module defines its own C_INT. You could use this instead of importing it from iso_c_binding.

Another solution is to rename the constants from iso_c_binding so they don't conflict with f90_util. E.g.:

USE iso_c_binding, ONLY: C_INT_ISO => C_INT, C_LONG_LONG_ISO => C_LONG_LONG

Then you would use C_INT_ISO and C_LONG_LONG_ISO to declare your variables.

weiguangcui commented 5 years ago

I prefer the later solution, but it seems not to work:

read_hdf5.f90(66): error #6683: A kind type parameter must be a compile-time constant.   [C_INT_ISO]
    integer(kind=C_INT_ISO) :: ret
weiguangcui commented 5 years ago

Hi,

I managed to get gadgetviewer to reading all the particle data, but failed at really weird seg fault:

 npos =             2927861006
 np   =             2927861006
 npos =             2986707709
 np   =             2986707709
 npos =              876811293
 np   =              876811293
 npos =                5935839
 np   =                5935839
 npos =               58024712
 np   =               58024712
 npos =                 134486
 np   =                 134486

Program received signal SIGFPE, Arithmetic exception.
Segmentation fault
[dc-cui3@cosma-m [cosma7] src]$ forrtl: error (65): floating invalid
Image              PC                Routine            Line        Source             
gadgetviewer       000000000066F0EE  Unknown               Unknown  Unknown
libpthread-2.17.s  00002AAAAE99A5D0  Unknown               Unknown  Unknown
gadgetviewer       000000000046ABD2  octreemod_mp_buil         272  octree.F90
gadgetviewer       00000000004D8BBE  particle_store_mp        1148  particle_store.F90
gadgetviewer       0000000000633A46  snapshot_reader_m         171  snapshot_reader.f90
gadgetviewer       000000000065AFF5  MAIN__                     59  gadgetviewer.F90
gadgetviewer       000000000040E4CE  Unknown               Unknown  Unknown
libc-2.17.so       00002AAAAEBC93D5  __libc_start_main     Unknown  Unknown
gadgetviewer       000000000040E3E9  Unknown               Unknown  Unknown

You can see that the np and npos are matched in particle_store_verify() function. But the code exists when it builds the octree... And the line 272 in octree.F90 seems all right to me: if(tree%posmin(j).gt.pos(j,i))tree%posmin(j)=pos(j,i) I don't know and can not track the top two functions (unknown). I included debugging flags when I compile the code and use gdb to run it, but it still crashes... Please let me know if you have any suggestions. Thank you so much.

PS: please let me know if you want me to push the modifications and have a look at them.

jchelly commented 5 years ago

My guess would be that some of the positions have not been set so you get a floating point exception when it tries to build the octree. Maybe there are still some 4 byte integers in gadget_hdf5_reader.F90 that need to be changed to 8 bytes. I'd be happy to take a look at your changes - if we can get this to work it will be a nice enhancement.

jchelly commented 3 years ago

I think this was fixed by #6.