jonescompneurolab / hnn-core

Simulation and optimization of neural circuits for MEG/EEG source estimates
https://jonescompneurolab.github.io/hnn-core/
BSD 3-Clause "New" or "Revised" License
53 stars 51 forks source link

MPIBackend error on Windows 10: "Unknown option: --use-hwthread-cpus" #589

Open rythorpe opened 1 year ago

rythorpe commented 1 year ago
init network
drive type is Rhythmic, location=proximal
drive type is Rhythmic, location=distal
drive type is Evoked, location=distal
drive type is Evoked, location=proximal
drive type is Evoked, location=proximal
drive type is Evoked, location=proximal
drive type is Poisson, location=proximal
start simulation
MPI will run 2 trial(s) sequentially by distributing network neurons over 11 processes.
Unknown option: --use-hwthread-cpus

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
File ~\Documents\GitHub\hnn-core\hnn_core\gui\gui.py:1307, in run_button_clicked(widget_simulation_name, log_out, drive_widgets, all_data, dt, tstop, ntrials, backend_selection, mpi_cmd, n_jobs, params, simulation_status_bar, simulation_status_contents, connectivity_sliders, viz_manager)
   1305     with backend:
   1306         simulation_status_bar.value = simulation_status_contents['running']
-> 1307         simulation_data[_sim_name]['dpls'] = simulate_dipole(
   1308             simulation_data[_sim_name]['net'],
   1309             tstop=tstop.value,
   1310             dt=dt.value,
   1311             n_trials=ntrials.value)
   1313         simulation_status_bar.value = simulation_status_contents[
   1314             'finished']
   1316 viz_manager.reset_fig_config_tabs()

File ~\Documents\GitHub\hnn-core\hnn_core\dipole.py:100, in simulate_dipole(net, tstop, dt, n_trials, record_vsec, record_isec, postproc)
     95 if postproc:
     96     warnings.warn('The postproc-argument is deprecated and will be removed'
     97                   ' in a future release of hnn-core. Please define '
     98                   'smoothing and scaling explicitly using Dipole methods.',
     99                   DeprecationWarning)
--> 100 dpls = _BACKEND.simulate(net, tstop, dt, n_trials, postproc)
    102 return dpls

File ~\Documents\GitHub\hnn-core\hnn_core\parallel_backends.py:717, in MPIBackend.simulate(self, net, tstop, dt, n_trials, postproc)
    712 print(f"MPI will run {n_trials} trial(s) sequentially by "
    713       f"distributing network neurons over {self.n_procs} processes.")
    715 env = _get_mpi_env()
--> 717 self.proc, sim_data = run_subprocess(
    718     command=self.mpi_cmd, obj=[net, tstop, dt, n_trials], timeout=30,
    719     proc_queue=self.proc_queue, env=env, cwd=os.getcwd(),
    720     universal_newlines=True)
    722 dpls = _gather_trial_data(sim_data, net, n_trials, postproc)
    723 return dpls

File ~\Documents\GitHub\hnn-core\hnn_core\parallel_backends.py:174, in run_subprocess(command, obj, timeout, proc_queue, *args, **kwargs)
    171 if not sent_network:
    172     # Send network object to child so it can start
    173     try:
--> 174         _write_net(proc.stdin, pickled_obj)
    175     except BrokenPipeError:
    176         # child failed during _write_net(). get the
    177         # output and break out of loop on the next
    178         # iteration
    179         warn("Received BrokenPipeError exception. "
    180              "Child process failed unexpectedly")

File ~\Documents\GitHub\hnn-core\hnn_core\parallel_backends.py:475, in _write_net(stream, pickled_net)
    473 stream.flush()
    474 stream.write('@start_of_net@')
--> 475 stream.write(pickled_net.decode())
    476 stream.write('@end_of_net:%d@\n' % len(pickled_net))
    477 stream.flush()

OSError: [Errno 22] Invalid argument
rythorpe commented 1 year ago

@dylansdaniels can you check to see if you get the same thing with a development installation off of the master branch on Windows?

jasmainak commented 1 year ago

@rythorpe what distribution of MPI did you install?

jasmainak commented 1 year ago

I think this logic has to be OS-dependent or MPI dependent since the --use-hwthread-cpus option is only a feature of OpenMPI, not MSMPI

rythorpe commented 1 year ago

Oh shoot, I think you might be right :man_facepalming: There are probably many aspects of our MPIBackend that are currently incompatible with Windows. For instance, the feature I was trying to implement in #506 will probably need a bit of help before working on a Windows platform.

dylansdaniels commented 1 year ago

@rythorpe just getting to this. do you still want me to test? Or is it a moot point since windows doesn't support OpenMPI? I'll go ahead and do a fresh fork and get it set up on my windows computer in any case

rythorpe commented 1 year ago

It's always nice to have more eyes on it, but I wouldn't worry about testing it for now. Maybe once we get the present issue resolved we can both do separate installations and try to break it :)

dylansdaniels commented 1 year ago

sounds good i'll be ready for it :)

jasmainak commented 1 year ago

@rythorpe I would suggest that we copy these lines from the Neuron CIs so we avoid regressions in the future. The CIs will initially fail but you can work backwards making the CI pass like in TDD

jasmainak commented 1 year ago

It would also be nice to update this document once we figure out how to make it work: https://jonescompneurolab.github.io/hnn-core/stable/parallel.html#mpi

rythorpe commented 1 year ago

@rythorpe I would suggest that we copy these lines from the Neuron CIs so we avoid regressions in the future. The CIs will initially fail but you can work backwards making the CI pass like in TDD

I'm pretty sure msmpi is distributed with NEURON and is thus automatically installed during our unit test CIs. That's how it ended up on my Windows installation at least.

I'm guessing the reason it doesn't show up is because somehow the --use-hwthread-cpus option isn't getting called in our tests.

jasmainak commented 1 year ago

umm ... I don't think so. See here for the last windows CI run on master

image

see the skipped test on parallel backends

rythorpe commented 1 year ago

Ugh. Remind me again why we set up the MPIBackend tests to fail silently? I think we should consider reverting that since the MPIBackend will most likely be default for new users in workshops, etc.

jasmainak commented 1 year ago

No they don't fail. They just get skipped if MPI is not installed. It keeps the barrier low for new developers