lorenzo-rovigatti / oxDNA

A new version of the code to simulate the oxDNA/oxRNA models, now equipped with Python bindings
https://dna.physics.ox.ac.uk/
GNU General Public License v3.0
38 stars 26 forks source link

"Segmentation fault" when running oxDNA with an external force file on MPI. #102

Open FujiwaraRobert opened 2 months ago

FujiwaraRobert commented 2 months ago

Input file:

conf_file = output.dat
topology = output.top
mismatch_repulsion = 0
use_average_seq = 1
T = 20C
job_title = self_lock_7_5
steps = 1000000000
salt_concentration = 1
backend = CUDA
interaction_type = DNA2
print_conf_interval = 1000000
print_energy_every = 1000000
dt = 0.001
external_forces = 0
sim_type = MD
max_density_multiplier = 10
verlet_skin = 0.5
time_scale = linear
ensemble = NVT
thermostat = john
diff_coeff = 2.5
backend_precision = mixed
external_forces = 1
external_forces_file = force.txt
lastconf_file = last_conf.dat
trajectory_file = trajectory.dat
energy_file = energy.dat
refresh_vel = 1
restart_step_counter = 1
newtonian_steps = 103
CUDA_list = verlet
CUDA_sort_every = 0
use_edge = 1
edge_n_forces = 1
max_density_multiplier=20000

Error message:

Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
mpirun noticed that process rank 0 with PID 1815700 on node dcc-plusds-gpu-07 exited on signal 11 (Segmentation fault).

This is the error message I got while running my oxDNA input file. oxDNA can run for several steps and then it will encounter this problem. If I remove the external force from the input file, it works well. I don't know if I need to add any extra settings to input file to deal with this problem. Thanks so much.

lorenzo-rovigatti commented 2 months ago

Well, you shouldn't be using MPI since oxDNA does not support it (except for some rather obscure combination of input options). Can you try running it without MPI? I guess it will segfault regardless (and that may be due to the type & strength of external forces you are using), but it is worth checking.

FujiwaraRobert commented 2 months ago

Thanks a lot. I tried running without mpi. But I encountered another error "terminate called after throwing an instance of 'thrust::system::system_error' what(): device free failed: cudaErrorIllegalAddress: an illegal memory access was encountered INFO: # Caught SIGNAL 6; setting stop = 1 "

lorenzo-rovigatti commented 2 months ago

Thanks for trying. It looks like some kind of memory error. I guess it depends on the specific system and force you are using.

FujiwaraRobert commented 2 months ago

When I ran without mpi, some simulations worked well and some of them just couldn't start at all.

FujiwaraRobert commented 2 months ago

Also, CPU works well.

lorenzo-rovigatti commented 2 months ago

OK! It definitely looks like there is some stability issue and sometimes the simulation explodes. Unfortunately I can't help you much without having access to the simulation data.