m3g / packmol

Packmol - Initial configurations for molecular dynamics simulations
http://m3g.github.io/packmol
MIT License
222 stars 51 forks source link

Non reproducible system initialization for PACKMOL compiled for different operating systems #30

Closed jennyfothergill closed 2 years ago

jennyfothergill commented 2 years ago

I am using packmol following the example at https://sites.ualberta.ca/~chipuije/initial.html and additionally including a random seed. If I run the same input pdb and configuration file on OSX and linux, I get different outputs. My input files and output files can be found here.

OSX system: OSX Mojave 10.14.6 Linux system: CentOS Linux 7 Packmol was installed via conda in both instances

lmiq commented 2 years ago

I think that is expected and dependent on implementation details of the random number generators on each system. I do not think I can do anything about it.

chrisjonesBSU commented 2 years ago

It looks like there are some recent contributions and discussions in the Fortran standard library repo about pseudo random number generators. I'm not familiar enough with Fortran to know if this is a solution, but maybe it would be worth looking into?

https://github.com/fortran-lang/stdlib/pull/271 https://github.com/fortran-lang/stdlib/issues/135

Thank you!

lmiq commented 2 years ago

I will check the possibilities, but anyway, why is that important? No simulation should be dependent on the exact positions of the particles, and any subsequent simulation, notably those running in parallel, will most likely not be reproducible because of floating point operations not being associative. Is there any compelling reason to force the random sequence to be identical?

Or, more specifically: did anyone find a situation in which the result in one platform is acceptable and in other platform is not? If that is the case, probably that is a sign of another type of bug.

jennyfothergill commented 2 years ago

I understand your point, and it is unlikely that this is a blocking operation for most simulators. We are interested in comparing the single point energies between engines for a reproducibility study and this issue came to light then.

lmiq commented 2 years ago

If anyone is interested in making a small, test, use the code below.

Compile it with:

gfortran test.f90 -o test

and run it with:

./test

It will write two sequence of 10 numbers, that may or not be identical, depending on the system and compiler. If the first sequence is different in the two architectures and the second is identical everywhere, there is an easy way to make the results identical at least in this case (it will never be a generic solution, as there are not guarantees from any standard that two random number sequences will be identical across compilers, versions, or operating systems).

!
! Function that returns a real random number between 0. and 1.
! 

double precision function rnd() 
  call random_number(rnd)
  return 
end function rnd

!
! Subroutine that initializes the random number generator given a seed
!

subroutine init_random_number1(iseed)
  integer :: size
  integer :: i, iseed
  integer, allocatable :: seed(:)
  call random_seed(size=size)
  allocate(seed(size))
  do i = 1, size
    seed(i) = i*iseed
  end do
  call random_seed(put=seed)
  deallocate(seed)
  return
end subroutine init_random_number1

subroutine init_random_number2(iseed)
  integer, parameter :: size = 50
  integer :: i, iseed
  integer :: seed(size)
  do i = 1, size
    seed(i) = i*iseed
  end do
  call random_seed(put=seed)
  return
end subroutine init_random_number2

program main
    double precision :: rnd
    write(*,*) " First sequence: "
    call init_random_number1(123)
    do i = 1, 10
        write(*,*) rnd()
    end do

    write(*,*) " Second sequence: "
    call init_random_number2(123)
    do i = 1, 10
        write(*,*) rnd()
    end do
end program main
jennyfothergill commented 2 years ago

Thank you for looking into this. I've run your test. On OSX using GNU Fortran (Homebrew GCC 10.2.0_4) 10.2.0:

  First sequence: 
  0.47171731173329923     
  0.11725091293941303     
  0.13068507631575255     
  0.75979587031017681     
  0.35372369307705276     
  0.15413373070717729     
  0.17561267720475637     
  0.52519052355081008     
   1.2420903891045998E-002
   1.2340487605003836E-002
  Second sequence: 
  0.47171731173329923     
  0.11725091293941303     
  0.13068507631575255     
  0.75979587031017681     
  0.35372369307705276     
  0.15413373070717729     
  0.17561267720475637     
  0.52519052355081008     
   1.2420903891045998E-002
   1.2340487605003836E-002

And on Linux GNU Fortran (GCC) 8.3.0

  First sequence: 
  0.56981017567981018     
  0.49918477233869818     
  0.37431540548738862     
  0.75113158742830632     
  0.48423542175680601     
  0.11365891787988347     
  0.98121193804920970     
  0.40826890280898931     
  0.46079925822036805     
  0.50476830343336976     
  Second sequence: 
  0.56981017567981018     
  0.49918477233869818     
  0.37431540548738862     
  0.75113158742830632     
  0.48423542175680601     
  0.11365891787988347     
  0.98121193804920970     
  0.40826890280898931     
  0.46079925822036805     
  0.50476830343336976     

It appears that sequence 1 and 2 are identical on both systems but not across systems. Again, thank you for your response. This issue does not preclude system initialization with PACKMOL. My intent with opening this issue is to document that identical initial configurations can't be guaranteed even with the same random seed.

lmiq commented 2 years ago

Ok, so then this is OS and compiler dependent.

If for some reason that is very important, what is needed is to put a custom random number generator function in the package, and not use the one implemented in the compiler library.

That is not too hard, let me know if that makes a difference to you. I won't release a version of Packmol with that, but we can have a custom build.