ornladios / ADIOS

The old ADIOS 1.x code repository. Look for ADIOS2 for new repo
https://csmd.ornl.gov/adios
Other
54 stars 41 forks source link

bpdump crash on attribute file written by machine of different en #195

Closed khou2020 closed 5 years ago

khou2020 commented 5 years ago

Hi:

I ran the example program examples/C/attributes/attributes_write on a cetus@ALCF (a big-endian machine). I took the file generated (attributes.bp) to my PC (x86, little-endian). I run bpdump on my PC to dump attributes.bp taken from cetus. bpdump crash on a floating point error after printing most information.

$ bpdump attributes_big.bp
BP format version: 3
========================================================
Process Groups Index:
Group: temperature
        Process ID: 0
        Time Name:
        Time: 1
        Offset in File: 0
========================================================
Vars Index:
Var (Group) [ID]: /NX (temperature) [1]
        Datatype: integer
        Vars Characteristics: 1
        Offset(50)      Payload Offset(78)      File Index(-1)  Time Index(1)   Value(10)
Var (Group) [ID]: /size (temperature) [2]
        Datatype: integer
        Vars Characteristics: 1
        Offset(82)      Payload Offset(112)     File Index(-1)  Time Index(1)   Value(1)
Var (Group) [ID]: /rank (temperature) [3]
        Datatype: integer
        Vars Characteristics: 1
        Offset(116)     Payload Offset(146)     File Index(-1)  Time Index(1)   Value(0)
Var (Group) [ID]: /mean (temperature) [4]
        Datatype: double
        Vars Characteristics: 1
        Offset(150)     Payload Offset(180)     File Index(-1)  Time Index(1)   Value(4.500000e+00)
Var (Group) [ID]: /date (temperature) [5]
        Datatype: string
        Vars Characteristics: 1
        Offset(188)     Payload Offset(218)     File Index(-1)  Time Index(1)   Value(""Nov, 2009"")
Var (Group) [ID]: /temperature (temperature) [6]
        Datatype: double
        Vars Characteristics: 1
        Offset(227)     Payload Offset(373)     File Index(-1)  Time Index(1)   Min(0.000000e+00)       Max(9.000000e+00)       Dims (l:g:o): (1:1:0,10:10:0)
========================================================
Attributes Index:
Attribute (Group) [ID]: /temperature/number of levels (temperature) [7]
        Datatype: integer
        Attribute Characteristics: 1
                Offset(465)             Payload Offset(515)             File Index(-1)          Time Index(0)           Value(1)
Attribute (Group) [ID]: /temperature/description (temperature) [8]
        Datatype: string
        Attribute Characteristics: 1
                Offset(515)             Payload Offset(598)             File Index(-1)          Time Index(0)           Value("Global array written from 'size' processes")
Attribute (Group) [ID]: /temperature/mean value (temperature) [9]
        Datatype: (unknown: 255)
        Attribute Characteristics: 1
                Offset(598)             Payload Offset(637)             File Index(-1)          Time Index(0)           Var(4)
Attribute (Group) [ID]: /temperature/date of coding (temperature) [10]
        Datatype: (unknown: 255)
        Attribute Characteristics: 1
                Offset(637)             Payload Offset(680)             File Index(-1)          Time Index(0)           Var(5)
Attribute (Group) [ID]: /__adios__/version (temperature) [11]
        Datatype: string
        Attribute Characteristics: 1
                Offset(680)             Payload Offset(721)             File Index(-1)          Time Index(0)           Value("1.13.1")
Attribute (Group) [ID]: /__adios__/create_time_epoch (temperature) [12]
        Datatype: integer
        Attribute Characteristics: 1
                Offset(721)             Payload Offset(770)             File Index(-1)          Time Index(0)           Value(1553498729)
Attribute (Group) [ID]: /__adios__/update_time_epoch (temperature) [13]
        Datatype: integer
        Attribute Characteristics: 1
                Offset(770)             Payload Offset(819)             File Index(-1)          Time Index(0)           Value(1553498729)
========================================================
Process Group: 1
        Group Name: temperature
        Host Language Fortran?: N
        Coordination Var Member ID: 0
        Time Name:
        Time: 1
        Methods used in output: 1
                Method ID: 0
                Method Parameters:
        Vars Count: 6
                Var Name (ID): NX (1)
                Var Path:
                Datatype: integer
                Is Dimension: Y
                Characteristics:
                        Offset(50)                      Transform-type(0 = none)

                Var Name (ID): size (2)
                Var Path:
                Datatype: integer
                Is Dimension: Y
                Characteristics:
                        Offset(82)                      Transform-type(0 = none)

                Var Name (ID): rank (3)
                Var Path:
                Datatype: integer
                Is Dimension: Y
                Characteristics:
                        Offset(116)                     Transform-type(0 = none)

                Var Name (ID): mean (4)
                Var Path:
                Datatype: double
                Is Dimension: N
                Characteristics:
                        Offset(150)                     Transform-type(0 = none)

                Var Name (ID): date (5)
                Var Path:
                Datatype: string
                Is Dimension: N
                Characteristics:
                        Offset(188)                     Transform-type(0 = none)
Floating point exception (core dumped)

I tried to debug the code and found it is a division by 0 at adios_endianness.c:116 uint64_t num_elements = payload_size / size;

The issue did not occur on the file generated by examples/C/arrays/arrays_write. It also did not happen when I dump little-endian file (generate on my PC) on big endian machine (cetus).

Is there any fix?

pnorbert commented 5 years ago

That's a bug. Can you send the attribute.bp to us? pnorbert at ornl dot gov.

On Mon, Mar 25, 2019, 3:49 AM Kaiyuan Hou notifications@github.com wrote:

Hi: I ran the example program examples/C/attributes/attributes_write on a cetus@ALCF (a big-endian machine). I took the file generated (attributes.bp) to my PC (x86, little-endian). I run bpdump on my PC to dump attributes.bp taken from cetus. bpdump crash on a floating point error after printing most information. $ bpdump attributes_big.bp BP format version: 3

Process Groups Index: Group: temperature Process ID: 0 Time Name: . . . Var Name (ID): date (5) Var Path: Datatype: string Is Dimension: N Characteristics: Offset(188) Transform-type(0 = none) Floating point exception (core dumped)

I tried to debug the code and found it is a division by 0 at adios_endianness.c:116 uint64_t num_elements = payload_size / size;

The issue did not occur on the file generated by examples/C/arrays/arrays_write. It also did not happen when I dump little-endian file (generate on my PC) on big endian machine (cetus).

Is there any fix?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ornladios/ADIOS/issues/195, or mute the thread https://github.com/notifications/unsubscribe-auth/ADGMLZpuki62Y5TTg1fJbtBae3UJdHS6ks5vaH-hgaJpZM4cGQzv .

pnorbert commented 5 years ago

Thank you for the bug report. It is now fixed in commit 1177319.

khou2020 commented 5 years ago

Will this be included in the next official release? Thanks.

pnorbert commented 5 years ago

It would be but we did not plan to have a release any time soon. Do you need a release for your users?

On Fri, Mar 29, 2019 at 2:30 PM Kaiyuan Hou notifications@github.com wrote:

Will this be included in the next official release? Thanks.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/ornladios/ADIOS/issues/195#issuecomment-478103890, or mute the thread https://github.com/notifications/unsubscribe-auth/ADGMLZunEthxWcwhj_v9EnY_a7j3TCkQks5vblvBgaJpZM4cGQzv .