jbattini / fast-cpp-csv-parser

Automatically exported from code.google.com/p/fast-cpp-csv-parser
0 stars 0 forks source link

segfault for large files #1

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. unzip stop_times.txt.7z
2. try io::CSVReader<5> in("stop_times.txt");
3. this works; add an additional line to the file by copying the last line
4. breaks with a segfault

What is the expected output? What do you see instead?
The expected output is nothing, I see a segfault

What version of the product are you using? On what operating system?
Ubuntu Linux 12.4, gcc 4.6.3

Please provide any additional information below.
As far as I could debug, the error occurs while leaving scope in the init() 
function.

Original issue reported on code.google.com by michael....@gmail.com on 20 Dec 2012 at 6:05

Attachments:

GoogleCodeExporter commented 8 years ago
Have you added -lpthread when linking your program as the last parameter to 
GCC? I can only reproduce an std::system_error exception crash when omitting 
that parameter.  I can not reproduce a segfault.

If you do not add that parameter then the threading library is not properly 
initialized and the standard library throws an exception when trying to spawn a 
thread. The reason why this does not occur without duplicating the line is 
because the original file fits completely into the memory (while the modified 
file is too big) and therefore no asynchronous read is ever performed.

If this does not solve your problem then:
* please give a minimal compilable example that illustrates your problem.
* include the exact input file that actually triggers the bug. (If its a 
problem with UTF BOMs or newline encoding then your editor might duplicate the 
line in a different way than mine,)
* Indicate the exact flags that you used while compiling an linking.

Original comment by strasser...@gmail.com on 26 Dec 2012 at 10:50

GoogleCodeExporter commented 8 years ago

Original comment by strasser...@gmail.com on 26 Dec 2012 at 11:32

GoogleCodeExporter commented 8 years ago

Original comment by strasser...@gmail.com on 26 Dec 2012 at 11:33

GoogleCodeExporter commented 8 years ago
As far as I have checked it, -lpthread was the reason.

I once got a segfault (release, -O3) and the other time a system_error: 
"operation not permitted" (debug, -g3).

Thanks a lot for the quick reply!

Original comment by michael....@gmail.com on 26 Dec 2012 at 11:40

GoogleCodeExporter commented 8 years ago
When statically linking, gcc seems to drop pthreads no matter whether -lpthread 
is specified or not. To force gcc to use pthreads, you have to add

-Wl,-whole-archive /usr/lib/i386-linux-gnu/libpthread.a -Wl,-no-whole-archive

Original comment by michael....@gmail.com on 26 Dec 2012 at 5:04

GoogleCodeExporter commented 8 years ago
Another way to "fix" this is to get rid of the multithreading altogether. There 
are two calls to std::async in the header. 

    bytes_read = std::async(std::launch::async, [=]()->int{ 
        return std::fread(buffer + 2*block_len, 1, block_len, file); 
    });

replace both of them with

    bytes_read = std::async([=]()->int{ 
        return std::fread(buffer + 2*block_len, 1, block_len, file); 
    });

(i.e. drop the std::launch::async) and the call should be synchronous (at least 
when using the current GCC).

Original comment by strasser...@gmail.com on 26 Dec 2012 at 5:45

GoogleCodeExporter commented 8 years ago
Never mind. Your CSV parsing library is by far the fastest and fanciest I've 
yet seen. It's definitely the future of CSV parsing, so do not even think of 
dropping multithreading support just because of some touchy compiler ;)

Original comment by michael....@gmail.com on 26 Dec 2012 at 5:58