linux-rdma / opensm

Other
66 stars 36 forks source link

Opensm 3.3.21 Crash on debian systems #30

Open rboinipelli opened 2 years ago

rboinipelli commented 2 years ago

Dear Developers

We are experiencing opensm (V 3.3.21 ) crash from time to time on debian hosts, our developers suspect that it's thing with lash algorithm, Please can you advise a fix here.

from crash report

Signal: 11 SourcePackage: opensm Stacktrace:

0 0x00005636f04835a9 in get_next_switch (p_lash=0x1, link=, sw=0) at osm_ucast_lash.c:337

No locals.

1 generate_cdg_for_sp (p_lash=p_lash@entry=0x5636f0eef0b0, sw=sw@entry=0, dest_switch=dest_switch@entry=1, lane=lane@entry=0) a

t osm_ucast_lash.c:337 num_switches = 13 switches = 0x7f501c2a7d20 cdg_vertex_matrix = 0x7f501c2d0e00 next_switch = output_link = j = exists = v = prev = 0x0

2 0x00005636f0484aa8 in lash_core (p_lash=) at osm_ucast_lash.c:842

    lanes_needed = 1
    k = <optimized out>
    dest_switch = 1
    output_link = <optimized out>
    cycle_found2 = <optimized out>
    num_switches = <optimized out>
    switches = <optimized out>
    output_link2 = <optimized out>

Please advise here with a possible solution, help much appreciated.

Please advise here with a possible solution, help much appreciated.

Thank you in advance

Best Regards