romeric / Fastor

A lightweight high performance tensor algebra framework for modern C++
MIT License
752 stars 70 forks source link

Architecture (?) dependent Segmentation fault #139

Closed Bbllaaddee closed 7 months ago

Bbllaaddee commented 3 years ago

TLDR: segfault without -march=native on g++

Hello, it's me again!

I was working with my program, while suddenly it stopped working with strange Segmentation fault. I thought that this was the problem on my part with something like trying to access illegal memory address. So, after some primitive tests, I invoked valgrind, and saw, that the error comes from Fastor for some odd reason.

Here is the simple program that imitates what I do:

#include <iostream>
#include <array>
#include <Fastor/Fastor.h>

using namespace Fastor;

int main()
{
    std::vector<std::array<double,3>> vec_arr(5,{1,2,3});

    for(size_t i=1; i<vec_arr.size(); ++i)
    {
        TensorMap<double,3> arr_tens_map_stat(vec_arr[0].data());
        TensorMap<double,3> arr_tens_map_var(vec_arr[i].data());
        Tensor<double,3> tens = {7,8,9};
        arr_tens_map_stat -= tens;
        arr_tens_map_var += tens;
    }
}

This program compiles with any options. I usually compile with g++ ... -march=native .... But in this case it was turned off, and that (!) was the origin of the error.

So, if I compile it with arch specified, all goes well. Without it, it gives segfault.

I can see the proper error with the following compilation flags: g++ -std=c++17 -Og -g main.cpp. Output of valgrind ./a.out:

==12845== Memcheck, a memory error detector
==12845== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==12845== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==12845== Command: ./a.out
==12845== 
==12845== 
==12845== Process terminating with default action of signal 11 (SIGSEGV)
==12845==  General Protection Fault
==12845==    at 0x1095D2: _mm_store_pd (emmintrin.h:166)
==12845==    by 0x1095D2: store (simd_vector_double.h:655)
==12845==    by 0x1095D2: trivial_assign_add<Fastor::TensorMap<double, 3>, 1, Fastor::Tensor<double, 3>, 1> (TensorAssignment.h:44)
==12845==    by 0x1095D2: assign_add<Fastor::TensorMap<double, 3>, 1, double, 3> (TensorAssignment.h:254)
==12845==    by 0x1095D2: operator+=<Fastor::Tensor<double, 3>, 1> (TensorInplaceOperators.h:8)
==12845==    by 0x1095D2: main (main.cpp:18)
==12845== 
==12845== HEAP SUMMARY:
==12845==     in use at exit: 120 bytes in 1 blocks
==12845==   total heap usage: 8 allocs, 7 frees, 72,969 bytes allocated
==12845== 
==12845== LEAK SUMMARY:
==12845==    definitely lost: 0 bytes in 0 blocks
==12845==    indirectly lost: 0 bytes in 0 blocks
==12845==      possibly lost: 0 bytes in 0 blocks
==12845==    still reachable: 120 bytes in 1 blocks
==12845==         suppressed: 0 bytes in 0 blocks
==12845== Rerun with --leak-check=full to see details of leaked memory
==12845== 
==12845== For lists of detected and suppressed errors, rerun with: -s
==12845== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Segmentation fault (core dumped)

Tested with g++-9, g++-10 and clang++-10.

romeric commented 3 years ago

That is a memory alignment issue. Define FASTOR_DONT_ALIGN and rerun your example. The macro does not impact performance and was designed specifically for this purpose.

Bbllaaddee commented 3 years ago

Thanks!

Should it be put in wiki or somewhere, what do you think? Might be useful to have a page of macros user can define and what they do

romeric commented 3 years ago

There is a better solution for this that should come pretty soon. Meanwhile I'll have this documented as it's essentially a bug.

Thanks for reporting

matthiasneuner commented 1 year ago

Is there already a solution for this issue? Currently, working with TensorMaps can be tedious due to those SEGFAULTs.

romeric commented 7 months ago

Yes, the feature is now finally implemented via 652972f. The new implemented feature helps us distinguish between aligned and unaligned data during TensorMap construction and hence avoid segfaults. This is a new feature so if any new issue arises as a result, please report. I am closing this one.

Thank you for your patience!