lava-nc / lava

A Software Framework for Neuromorphic Computing
https://lava-nc.org
Other
535 stars 136 forks source link

utils/slurm.py splits zero-length string when specifying partition or board in use_slurm_host #753

Open furlong-cmu opened 11 months ago

furlong-cmu commented 11 months ago

Describe the bug The try_run_command(['sinfo','-N']) function returns a final zero-length line when specifying a partition/board in the use_slurm_host function which seems to cause the above bugs after the line.split() command on line 122 for partitions, 185 for boards, when run on the ncl-edu vLab instance..

To reproduce current behavior When executing the code

from lava.utils import loihi
loihi.use_slurm_host(loihi_gen=loihi.ChipGeneration.N3B3, partition='[partition-name]')

I get the error

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[3], line 3
      1 from lava.utils import loihi
----> 3 loihi.use_slurm_host(partition='oheogluch', loihi_gen=loihi.ChipGeneration.N3B3)
      4 use_loihi2 = loihi.is_installed()
      6 # if use_loihi2:

File ~/lava_env/lib/python3.8/site-packages/lava/utils/loihi.py:57, in use_slurm_host(partition, board, loihi_gen)
     54 os.environ["LOIHI_GEN"] = loihi_gen.value
     56 slurm.set_board(board, partition)
---> 57 slurm.set_partition(partition)
     59 global host
     60 host = "SLURM"

File ~/lava_env/lib/python3.8/site-packages/lava/utils/slurm.py:85, in set_partition(partition)
     82     os.environ.pop("PARTITION", None)
     83     return
---> 85 partition_info = get_partition_info(partition)
     87 print(partition_info)
     88 if partition_info is None:# or "down" in partition_info.state:

File ~/lava_env/lib/python3.8/site-packages/lava/utils/slurm.py:152, in get_partition_info(partition_name)
    138 def get_partition_info(partition_name: str) -> ty.Optional[PartitionInfo]:
    139     """Get the SLURM info for the specified partition, if available.
    140 
    141     Parameters
   (...)
    150         controller does not have the specified partition.
    151     """
--> 152     matching_partitions = [p for p in get_partitions()
    153                            if p.name == partition_name]
    155     return next(iter(matching_partitions), None)

File ~/lava_env/lib/python3.8/site-packages/lava/utils/slurm.py:135, in get_partitions()
    128     nodel = fields[5] if len(fields) > 5 else ''
    129     return PartitionInfo(name=name,
    130                          available=avail,
    131                          timelimit=limit,
    132                          nodes=nodes,
    133                          state=state,
    134                          nodelist=nodel)
--> 135 return [parse_partition(line) for line in lines]

File ~/lava_env/lib/python3.8/site-packages/lava/utils/slurm.py:135, in <listcomp>(.0)
    128     nodel = fields[5] if len(fields) > 5 else ''
    129     return PartitionInfo(name=name,
    130                          available=avail,
    131                          timelimit=limit,
    132                          nodes=nodes,
    133                          state=state,
    134                          nodelist=nodel)
--> 135 return [parse_partition(line) for line in lines]

File ~/lava_env/lib/python3.8/site-packages/lava/utils/slurm.py:123, in get_partitions.<locals>.parse_partition(line)
    121 def parse_partition(line: str) -> PartitionInfo:
    122     fields = line.split()
--> 123     name = fields[0]
    124     avail = fields[1] if len(fields) > 1 else ''
    125     limit = fields[2] if len(fields) > 2 else ''

IndexError: list index out of range

There is a corresponding error in parse_board (line 184) when I run the code:

from lava.utils import loihi
loihi.use_slurm_host(loihi_gen=loihi.ChipGeneration.N3B3, board='[board-name]')

Expected behavior partiton or board should be selected and os.environ variable should reflect the choice.

Environment (please complete the following information):

PhilippPlank commented 8 months ago

@tim-shea could you take a look?