safegaurd Box-Muller normal random number generation against u=0.0

dacarnazzola commented 1 year ago

This fix also allows a single randn_vec subroutine to be used for any array rank, as well as improves performance ~3-10x. See example program below for that:

module nf_random
implicit none
private

    public :: randn, randn_vec

    real, parameter :: pi = 4 * atan(1.d0)

    interface randn
        module procedure randn_1d
        module procedure randn_2d
        module procedure randn_4d
    end interface randn

    contains

    function randn_1d(i) result(x)
        integer, intent(in) :: i
        real :: x(i)
        real :: u(i), v(i)
        call random_number(u)
        call random_number(v)
        x = sqrt(-2 * log(u)) * cos(2 * pi * v)
    end function randn_1d

    function randn_2d(i, j) result(x)
        integer, intent(in) :: i, j
        real :: x(i,j)
        real :: u(i,j), v(i,j)
        call random_number(u)
        call random_number(v)
        x = sqrt(-2 * log(u)) * cos(2 * pi * v)
    end function randn_2d

    function randn_4d(i, j, k, l) result(x)
        integer, intent(in) :: i, j, k, l
        real :: x(i,j,k,l)
        real :: u(i,j,k,l), v(i,j,k,l)
        call random_number(u)
        call random_number(v)
        x = sqrt(-2 * log(u)) * cos(2 * pi * v)
    end function randn_4d 

    impure subroutine randn_vec(x, n)
        real, intent(out) :: x(*)
        integer, intent(in), value :: n
        real :: u((n+1)/2), v((n+1)/2), r((n+1)/2)
        integer :: i, nu
        call random_number(u)
        nu = size(u)
        do i=1,nu
            do while (u(i) == 0.0)
                call random_number(u(i))
            end do
        end do
        call random_number(v)
        r = sqrt(-2.0*log(u))
        x(1:nu) = r*sin(2.0*pi*v)
        if (n > (nu+1)) x(nu+1:n) = r(1:(n-nu))*cos(2.0*pi*v(1:(n-nu)))
    end subroutine randn_vec

end module nf_random

program main
use, intrinsic :: iso_fortran_env, only: i64 => int64
use, non_intrinsic :: nf_random, only: randn, randn_vec

    integer, parameter :: r_max = 10

    real, allocatable :: x1(:), x2(:,:), x4(:,:,:,:)
    integer :: i, n, r
    integer(i64) :: c1, cr, c2
    real :: elapsed

    do i=20,30
        n = 2**i
        write(*,'(a,i0,a)') 'testing n=',n,' elements'
        if (allocated(x1)) deallocate(x1)
        if (allocated(x2)) deallocate(x2)
        if (allocated(x4)) deallocate(x4)
        allocate(x1(n), x2(1,n), x4(1,1,1,n))
        write(*,'(a)') 'nf_random::randn'
        call system_clock(c1, cr)
        do r=1,r_max
            x1 = randn(size(x1))
            x2 = randn(size(x2,1), size(x2,2))
            x4 = randn(size(x4,1), size(x4,2), size(x4,3), size(x4,4))
        end do
        call system_clock(c2)
        elapsed = real(max(c2 - c1, 1_i64))/real(cr)
        write(*,'(a,e13.6,a)') '  elapsed: ',elapsed,' seconds'
        write(*,'(a,3e13.6)') '  avg x1/x2/x4: ',sum(x1)/size(x1),sum(x2)/size(x2),sum(x4)/size(x4)
        write(*,'(a,3e13.6)') '  std x1/x2/x4: ',sqrt(sum((x1-sum(x1)/size(x1))**2)/real(size(x1)-1)), &
                                      sqrt(sum((x2-sum(x2)/size(x2))**2)/real(size(x2)-1)), &
                                      sqrt(sum((x4-sum(x4)/size(x4))**2)/real(size(x4)-1))
        write(*,'(a)') 'randn_vec'
        call system_clock(c1, cr)
        do r=1,r_max
            call randn_vec(x1, size(x1))
            call randn_vec(x2, size(x2))
            call randn_vec(x4, size(x4))
        end do
        call system_clock(c2)
        elapsed = real(max(c2 - c1, 1_i64))/real(cr)
        write(*,'(a,e13.6,a)') '  elapsed: ',elapsed,' seconds'
        write(*,'(a,3e13.6)') '  avg x1/x2/x4: ',sum(x1)/size(x1),sum(x2)/size(x2),sum(x4)/size(x4)
        write(*,'(a,3e13.6)') '  std x1/x2/x4: ',sqrt(sum((x1-sum(x1)/size(x1))**2)/real(size(x1)-1)), &
                                      sqrt(sum((x2-sum(x2)/size(x2))**2)/real(size(x2)-1)), &
                                      sqrt(sum((x4-sum(x4)/size(x4))**2)/real(size(x4)-1))
        write(*,'(a)') ''
    end do

end program main

jvdp1 commented 1 year ago

As for guarding against u == 0, why not just do something like this:

For guarding against u==0, I usually do

call random_number(u)
u = 1 - u

because random_number will return a value in the range 0<=u <1

dacarnazzola commented 1 year ago

As for guarding against u == 0, why not just do something like this:
u = max(tiny(u), u)
which is much simpler than the doubly-nested do loop.

That is indeed much simpler; performance should be tested. It may be significantly faster too!

As far as not using assumed size arrays, that is a matter of preference and if this project does not want to use them, then I would agree impure elemental subroutine at least meets the need of reducing code duplication.

dacarnazzola commented 1 year ago

I have went ahead and implemented u(1) = 1.0 - u(1) instead of the while loop to keep u > 0.0. On my machine, this measured ever so slightly faster than u = max(tiny(1.0), u) for a large array. I suspect for a size of 2 there will be no measurable difference.

The procedure is also now impure elemental subroutine as requested. I wasn't sure what you guys like to do with dead code, so I just commented out the unused module functions from nf_random, but left the actual implementations in the submodule alone.

EDIT: Turns out this approach runs rather slower than the previous method on my machine when compiled with gfortran. Using ifort brings it in line with the other methods, but this is a common occurrence I would say. Assumed size arrays seem to generate the fastest code in a lot of cases based on my experience.

milancurcic commented 1 year ago

Great, thank you! You can remove the old functions from the code since the new subroutine will be used instead and we have git to keep history.

milancurcic commented 1 year ago

Great @dacarnazzola let's just rename randn_ele to random_normal (I don't know why I called it randn it was a long time ago, maybe after np.randn), and this is good to go.

dacarnazzola commented 1 year ago

Alright, all fair points. I have updated the fix to only provide one value at a time, keeping no internal state. In response to a couple of questions/comments

Only u(1) needs to be /= 0.0 because log(0.0) is invalid, but cos(0.0) is not.
I believe impure elemental should indicate that this routine is not safe to parallelize.
There is no need for the variable r when only calculating a single value.

If my comment about impure elemental is incorrect, then I wonder why there was ever a requirement on elemental to be pure, which was later lifted to include procedures marked impure. I will note that in general, I have never seen a procedure that compiled to anything better, or executed faster, after being marked pure or elemental, but I would absolutely love to be shown a counter example.

dacarnazzola commented 1 year ago

Great @dacarnazzola let's just rename randn_ele to random_normal (I don't know why I called it randn it was a long time ago, maybe after np.randn), and this is good to go.

Do you want the interface renamed to random_normal, just randn_ele, or both?

milancurcic commented 1 year ago

I will note that in general, I have never seen a procedure that compiled to anything better, or executed faster, after being marked pure or elemental, but I would absolutely love to be shown a counter example.

Same here, but that may change as compilers and hardware improve.

milancurcic commented 1 year ago

Do you want the interface renamed to random_normal, just randn_ele, or both?

Since we now have just one specific procedure, I'd remove the interface altogether.

dacarnazzola commented 1 year ago

Do you want the interface renamed to random_normal, just randn_ele, or both?

Since we now have just one specific procedure, I'd remove the interface altogether.

This has been done. I don't have a lot of experience with submodules, but the compiler wasn't liking the main nf_random module not having an interface block. Since other modules seem ok without a corresponding submodule, I have removed nf_random_submodule.

modern-fortran / neural-fortran

safegaurd Box-Muller normal random number generation against u=0.0 #158