We have tried to use the csv-parser on a large dataset (8 million lines at 9,9 GB). However when looping over all lines and exectue row[column_name].get<std::string>() we get the following error message
`
==245==ERROR: AddressSanitizer: heap-use-after-free on address 0x621003c37248 at pc 0x56492659e7ee bp 0x7ffe476e2f20 sp 0x7ffe476e2f10
READ of size 8 at 0x621003c37248 thread T0
0 0x56492659e7ed in csv::internals::CSVFieldList::operator[](unsigned long) const /mwe/includes/csv_reader.h:7635
#1 0x56492659f298 in csv::CSVRow::get_field(unsigned long) const /mwe/includes/csv_reader.h:7694
#2 0x56492659ea9d in csv::CSVRow::operator[](unsigned long) const /mwe/includes/csv_reader.h:7656
#3 0x56492659ebea in csv::CSVRow::operator[](std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const /mwe/includes/csv_reader.h:7672
#4 0x5649265927c2 in getColumn(std::filesystem::__cxx11::path const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /mwe/src/main.cpp:27
#5 0x564926592eb0 in main /mwe/src/main.cpp:36
#6 0x7f66a36d0d8f (/lib/x86_64-linux-gnu/libc.so.6+0x29d8f)
#7 0x7f66a36d0e3f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x29e3f)
#8 0x564926591dc4 in _start (/mwe/build/csvMWE+0x7dc4)
0x621003c37248 is located 328 bytes inside of 4096-byte region [0x621003c37100,0x621003c38100)
freed by thread T107 here:
0 0x7f66a3cb722f in operator delete(void*, unsigned long) ../../../../src/libsanitizer/asan/asan_new_delete.cpp:172
0 0x7f66a3c58685 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:216
#1 0x7f66a3ab2388 in std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) (/lib/x86_64-linux-gnu/libstdc++.so.6+0xdc388)
#2 0x56492659d23d in csv::CSVReader::read_row(csv::CSVRow&) /mwe/includes/csv_reader.h:7536
#3 0x56492659e70a in csv::CSVReader::iterator::operator++() /mwe/includes/csv_reader.h:7605
#4 0x5649265928ad in getColumn(std::filesystem::__cxx11::path const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /mwe/src/main.cpp:25
#5 0x564926592eb0 in main /mwe/src/main.cpp:36
#6 0x7f66a36d0d8f (/lib/x86_64-linux-gnu/libc.so.6+0x29d8f)
SUMMARY: AddressSanitizer: heap-use-after-free /mwe/includes/csv_reader.h:7635 in csv::internals::CSVFieldList::operator[](unsigned long) const
Shadow bytes around the buggy address:
0x0c428077edf0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c428077ee00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c428077ee10: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c428077ee20: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c428077ee30: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
=>0x0c428077ee40: fd fd fd fd fd fd fd fd fd[fd]fd fd fd fd fd fd
0x0c428077ee50: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c428077ee60: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c428077ee70: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c428077ee80: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c428077ee90: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Shadow gap: cc
==245==ABORTING
`
The problem can be fixed, when using std::this_thread::sleep_for(std::chrono::nanoseconds(1)); in the same loop.
We have tried to use the csv-parser on a large dataset (8 million lines at 9,9 GB). However when looping over all lines and exectue
row[column_name].get<std::string>()
we get the following error message` ==245==ERROR: AddressSanitizer: heap-use-after-free on address 0x621003c37248 at pc 0x56492659e7ee bp 0x7ffe476e2f20 sp 0x7ffe476e2f10 READ of size 8 at 0x621003c37248 thread T0
0 0x56492659e7ed in csv::internals::CSVFieldList::operator[](unsigned long) const /mwe/includes/csv_reader.h:7635
0x621003c37248 is located 328 bytes inside of 4096-byte region [0x621003c37100,0x621003c38100) freed by thread T107 here:
0 0x7f66a3cb722f in operator delete(void*, unsigned long) ../../../../src/libsanitizer/asan/asan_new_delete.cpp:172
previously allocated by thread T107 here:
0 0x7f66a3cb61c7 in operator new(unsigned long) ../../../../src/libsanitizer/asan/asan_new_delete.cpp:99
Thread T107 created by T0 here:
0 0x7f66a3c58685 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:216
SUMMARY: AddressSanitizer: heap-use-after-free /mwe/includes/csv_reader.h:7635 in csv::internals::CSVFieldList::operator[](unsigned long) const Shadow bytes around the buggy address: 0x0c428077edf0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c428077ee00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c428077ee10: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c428077ee20: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c428077ee30: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd =>0x0c428077ee40: fd fd fd fd fd fd fd fd fd[fd]fd fd fd fd fd fd 0x0c428077ee50: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c428077ee60: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c428077ee70: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c428077ee80: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c428077ee90: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb Shadow gap: cc ==245==ABORTING `
The problem can be fixed, when using
std::this_thread::sleep_for(std::chrono::nanoseconds(1));
in the same loop.For reproduceability, I have put a MWE here: https://drive.google.com/file/d/1M_PJLlhxs8JTmIGEcDNCBAeBqxqmdNBC/view?usp=drive_link
Just extract it and run
docker build . --tag=mwe
, thendocker run -it mwe
and inside the container./runAndBuild.sh
.