mcg1969 / vecLibFort

Full GNU Fortran compatibility for Apple's vecLib BLAS/LAPACK
Boost Software License 1.0
16 stars 7 forks source link

vecLibFort does not work on PPC #6

Closed tenomoto closed 10 years ago

tenomoto commented 10 years ago

As far as I read what have been mentioned on vecLib again, vecLibFort or dotwrp in C should work. But what I get is a segfault when linked against them. I don't think it is related to Octave. The following program which returns complex causes a segfault when linked against the C wrapper but works fine when linked against the Fortran wrapper. Tests returning float works OK. A change from fcplx to fcplx * does not help. Moved from this comment.

      program main

        complex cdotu,a(1),b(1),w
        external cdotu
        a(1) = cmplx(1e0,1e0)
        b(1) = cmplx(1e0,2e0)
        w = cdotu(1,a,1,b,1)
        if (w .ne. a(1)*b(1)) stop 1

      end
mcg1969 commented 10 years ago

Great, thanks for moving this over. We need to figure out what's different with the Fortran ABI on PowerPC versus Intel. If anyone out there in GitHub land can help it would be much appreciated.

tenomoto commented 10 years ago

I get a bus error on ppc and an incorrect result on Intel (the imaginary part is zero). My poor knowledge on C tells that the local variables does not survive after the call. The issue is whether the two fields of the fcplx structure is copied on return. C wrapper

typedef struct cplx_ { float r, i; } fcplx;
fcplx addtwoc_(const fcplx *x, const fcplx *y) 
{
  fcplx z;
  z.r = x->r + y->r;
  z.i = x->i + y->i;
  return z;
}

Fortran test program

      program main
        complex x, y, z, addtwoc
        x = (1.0, 2.0)
        y = (2.0, 2.0)
        z = addtwoc(x, y)
        print *, "z =", z, " x+y=", x+y 
      end 
mcg1969 commented 10 years ago

You have to declare addtwoc to be complex as well. For example, complex x, y, z, addtwoc

tenomoto commented 10 years ago

Thanks. I still get the same error: bus error on ppc and 0 for the imaginary part on Intel.

mcg1969 commented 10 years ago

Hmm. I reproduced your error on Mavericks but adding the complex declaration fixed it.

tenomoto commented 10 years ago

I was doing something wrong. Yes, declaring addtwoc to be complex fixes the problem on Mavericks.

tenomoto commented 10 years ago

I found a way out: use C99 complex instead of structs. Returning structs seem to be architecture dependent. On PowerPC the pointer to memory is returned as the first parameter except for C99 complex types.

tenomoto commented 10 years ago

A better addtwoc that does not cause a bus error:

#include <complex.h>
float complex addtwoc_(const float complex *x, const float complex *y)
{
  float complex z;
  z = *x + *y;
  return z;
}
mcg1969 commented 10 years ago

Very interesting!

But I'm not sure we should assume C99 compliance, particularly on architecture this old. Why don't we just reimplement those routines using PowerPC-compatible calls?

I don't want to change the current approach for Intel. If you would like to submit a pull request that implements a PowerPC-specific solution, bracketed by #ifdefs, I'll merge it. Something like this:

if defined(ppc) || defined(ppc64)

/* PowerPC implementation _/

else

/_ Intel implementation */

endif

mcg1969 commented 10 years ago

Sorry, I realize I misunderstood the full ramifications here. We can test for the presence of C99 and use C99 complex values if true. That should be fine. But I do think we should at least flag as an error the cases where we don't have C99 and the calls are known to fail.

On Oct 4, 2014, at 8:43 PM, tenomoto notifications@github.com<mailto:notifications@github.com> wrote:

I found a way out: use C99 complex instead of structs. Returning structs seem to be architecture dependenthttp://concatenative.org/wiki/view/FFI/StructReturns. On PowerPC the pointer to memory is returned as the first parameter except for C99 complex types.

— Reply to this email directly or view it on GitHubhttps://github.com/mcg1969/vecLibFort/issues/6#issuecomment-57923925.

tenomoto commented 10 years ago

I believe that C99 is supported by clang. I looked at a gcc page and found that C99 complex support was added as of gcc-3.0. I have Apple's gcc-4.0 on Leopard.

mcg1969 commented 10 years ago

OK. Just to be safe I am going to wrap it in a typedef with tests to make sure C99 complex is supported.

mcg1969 commented 10 years ago

After doing some research it seems like the standard Leopard compilers, and everything offered since then, supports and the complex keyword.

Is that what you're seeing as well?

If so, then I am content to replace my c_float and c_double typedefs with this:

include

typedef float complex c_float; typedef double complex c_double;

Can you make that simple change and confirm everything works for you on Leopard? Seems like it works on Mavericks.

mcg1969 commented 10 years ago

I've pushed the change already. I've confirmed it works on Snow Leopard, so I figured what the heck. Still, I won't close this until you've verified it on PPC Leopard.

tenomoto commented 10 years ago

I confirmed that the current source works on ppc. Thanks!