vincentlaucsb / csv-parser

A high-performance, fully-featured CSV parser and serializer for modern C++.
MIT License
864 stars 144 forks source link

Replace the includes of Windows.h with windows.h (#204) #235

Closed ludovicdelfau closed 3 weeks ago

ludovicdelfau commented 1 month ago

When cross-compiling with MinGW, the compiler is unable to find the include of Windows.h because windows.h is provided instead. Replacing the includes of Windows.h with windows.h solves the issue and still works on Windows because the filesystem is case insensitive. This is what is already done in include/external/mio.hpp.

ludovicdelfau commented 3 weeks ago

When I run the tests on my computer, I only get a crash randomly and even on the master branch. I was only able to produce the random crash in Release build.

If I activate the AddressSanitizer:

I get randomly this trace:

=================================================================
==5080==ERROR: AddressSanitizer: heap-use-after-free on address 0x11cef91e0168 at pc 0x7ff690f332dd bp 0x004db296dd10 sp 0x004db296dd18
READ of size 8 at 0x11cef91e0168 thread T0
    #0 0x7ff690f332dc in csv::internals::CSVFieldList::operator[] csv-parser\include\internal\csv_row.cpp:14
    #1 0x7ff690f332dc in csv::CSVRow::get_field(unsigned __int64) const csv-parser\include\internal\csv_row.cpp:73
    #2 0x7ff690f31391 in csv::CSVRow::operator[](unsigned __int64) const csv-parser\include\internal\csv_row.cpp:35
    #3 0x7ff690f3127b in csv::CSVRow::operator[](class std::basic_string<char, struct std::char_traits<char>, class std::allocator<char>> const &) const csv-parser\include\internal\csv_row.cpp:51
    #4 0x7ff690e7b0b5 in <lambda_fa95be0ed0cf378516cdea9c93e188d7>::operator() csv-parser\tests\test_csv_iterator.cpp:126
    #5 0x7ff690e781aa in std::_Max_element_unchecked<csv::CSVReader::iterator,<lambda_fa95be0ed0cf378516cdea9c93e188d7> > C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.40.33807\include\xutility:6771
    #6 0x7ff690e7893f in std::max_element<csv::CSVReader::iterator,<lambda_fa95be0ed0cf378516cdea9c93e188d7> > C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.40.33807\include\xutility:6783
    #7 0x7ff690e82469 in CATCH2_INTERNAL_TEST_11 csv-parser\tests\test_csv_iterator.cpp:129
    #8 0x7ff690f5b2a6 in Catch::`anonymous namespace'::TestInvokerAsFunction::invoke csv-parser.build\_deps\catch2-src\src\catch2\internal\catch_test_registry.cpp:58
    #9 0x7ff690f665fe in Catch::TestCaseHandle::invoke csv-parser.build\_deps\catch2-src\src\catch2\catch_test_case_info.hpp:116
    #10 0x7ff690f665fe in Catch::RunContext::invokeActiveTestCase(void) csv-parser.build\_deps\catch2-src\src\catch2\internal\catch_run_context.cpp:561
    #11 0x7ff690f6738c in Catch::RunContext::runCurrentTest(class std::basic_string<char, struct std::char_traits<char>, class std::allocator<char>> &, class std::basic_string<char, struct std::char_traits<char>, class std::allocator<char>> &) csv-parser.build\_deps\catch2-src\src\catch2\internal\catch_run_context.cpp:524
    #12 0x7ff690f67a56 in Catch::RunContext::runTest(class Catch::TestCaseHandle const &) csv-parser.build\_deps\catch2-src\src\catch2\internal\catch_run_context.cpp:239
    #13 0x7ff690ff801e in Catch::`anonymous namespace'::TestGroup::execute csv-parser.build\_deps\catch2-src\src\catch2\catch_session.cpp:111
    #14 0x7ff690ff9233 in Catch::Session::runInternal(void) csv-parser.build\_deps\catch2-src\src\catch2\catch_session.cpp:333
    #15 0x7ff690ff8c98 in Catch::Session::run(void) csv-parser.build\_deps\catch2-src\src\catch2\catch_session.cpp:264
    #16 0x7ff690fef56f in Catch::Session::run csv-parser.build\_deps\catch2-src\src\catch2\catch_session.hpp:41
    #17 0x7ff690fef56f in main csv-parser.build\_deps\catch2-src\src\catch2\internal\catch_main.cpp:36
    #18 0x7ff690fed5d3 in invoke_main D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:78
    #19 0x7ff690fed5d3 in __scrt_common_main_seh D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:288
    #20 0x7ffa7a287343  (C:\Windows\System32\KERNEL32.DLL+0x180017343)
    #21 0x7ffa7a4826b0  (C:\Windows\SYSTEM32\ntdll.dll+0x1800526b0)

0x11cef91e0168 is located 744 bytes inside of 752-byte region [0x11cef91dfe80,0x11cef91e0170)
freed by thread T11 here:
    #0 0x7ff690fecac3 in operator delete(void *, unsigned __int64) D:\a\_work\1\s\src\vctools\asan\llvm\compiler-rt\lib\asan\asan_win_delete_scalar_size_thunk.cpp:41
    #1 0x7ff690e474d2 in std::_Deallocate<16>(void *, unsigned __int64) C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.40.33807\include\xmemory:258
    #2 0x7ff690f32230 in std::allocator<csv::internals::RawCSVField *>::deallocate C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.40.33807\include\xmemory:950
    #3 0x7ff690f32230 in std::vector<struct csv::internals::RawCSVField *, class std::allocator<struct csv::internals::RawCSVField *>>::_Change_array(struct csv::internals::RawCSVField **const, unsigned __int64, unsigned __int64) C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.40.33807\include\vector:2030
    #4 0x7ff690f2fa89 in std::vector<struct csv::internals::RawCSVField *, class std::allocator<struct csv::internals::RawCSVField *>>::_Emplace_reallocate<struct csv::internals::RawCSVField *const &>(struct csv::internals::RawCSVField **const, struct csv::internals::RawCSVField *const &) C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.40.33807\include\vector:850
    #5 0x7ff690f32f81 in std::vector<csv::internals::RawCSVField *,std::allocator<csv::internals::RawCSVField *> >::_Emplace_one_at_back C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.40.33807\include\vector:779
    #6 0x7ff690f32f81 in std::vector<csv::internals::RawCSVField *,std::allocator<csv::internals::RawCSVField *> >::push_back C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.40.33807\include\vector:867
    #7 0x7ff690f32f81 in csv::internals::CSVFieldList::allocate(void) csv-parser\include\internal\csv_row.cpp:19
    #8 0x7ff690f42f84 in csv::internals::CSVFieldList::emplace_back<unsigned int, unsigned __int64 &>(unsigned int &&, unsigned __int64 &) csv-parser\include\internal\csv_row.hpp:89
    #9 0x7ff690f48153 in csv::internals::IBasicCSVParser::push_field(void) csv-parser\include\internal\basic_csv_parser.cpp:108
    #10 0x7ff690f479b6 in csv::internals::IBasicCSVParser::parse(void) csv-parser\include\internal\basic_csv_parser.cpp:135
    #11 0x7ff690f47425 in csv::internals::MmapParser::next(unsigned __int64) csv-parser\include\internal\basic_csv_parser.cpp:253
    #12 0x7ff690f3ef63 in csv::CSVReader::read_csv(unsigned __int64) csv-parser\include\internal\csv_reader.cpp:246
    #13 0x7ff690e9112f in std::invoke C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.40.33807\include\type_traits:1721
    #14 0x7ff690e9112f in std::thread::_Invoke<class std::tuple<bool (__cdecl csv::CSVReader::*)(unsigned __int64), class csv::CSVReader *, unsigned __int64>, 0, 1, 2>(void *) C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.40.33807\include\thread:60
    #15 0x7ffa77c01bb1  (C:\Windows\System32\ucrtbase.dll+0x180021bb1)
    #16 0x7ffa01ba48fe in __asan::AsanThread::ThreadStart(unsigned __int64) D:\a\_work\1\s\src\vctools\asan\llvm\compiler-rt\lib\asan\asan_thread.cpp:299
    #17 0x7ffa7a287343  (C:\Windows\System32\KERNEL32.DLL+0x180017343)
    #18 0x7ffa7a4826b0  (C:\Windows\SYSTEM32\ntdll.dll+0x1800526b0)

previously allocated by thread T11 here:
    #0 0x7ff690feca35 in operator new(unsigned __int64) D:\a\_work\1\s\src\vctools\asan\llvm\compiler-rt\lib\asan\asan_win_new_scalar_thunk.cpp:40
    #1 0x7ff690e46ac1 in std::_Default_allocate_traits::_Allocate C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.40.33807\include\xmemory:101
    #2 0x7ff690e46ac1 in std::_Allocate<16, struct std::_Default_allocate_traits>(unsigned __int64) C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.40.33807\include\xmemory:225
    #3 0x7ff690f2f90f in std::allocator<csv::internals::RawCSVField *>::allocate C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.40.33807\include\xmemory:955
    #4 0x7ff690f2f90f in std::_Allocate_at_least_helper C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.40.33807\include\xmemory:2186
    #5 0x7ff690f2f90f in std::vector<struct csv::internals::RawCSVField *, class std::allocator<struct csv::internals::RawCSVField *>>::_Emplace_reallocate<struct csv::internals::RawCSVField *const &>(struct csv::internals::RawCSVField **const, struct csv::internals::RawCSVField *const &) C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.40.33807\include\vector:825
    #6 0x7ff690f32f81 in std::vector<csv::internals::RawCSVField *,std::allocator<csv::internals::RawCSVField *> >::_Emplace_one_at_back C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.40.33807\include\vector:779
    #7 0x7ff690f32f81 in std::vector<csv::internals::RawCSVField *,std::allocator<csv::internals::RawCSVField *> >::push_back C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.40.33807\include\vector:867
    #8 0x7ff690f32f81 in csv::internals::CSVFieldList::allocate(void) csv-parser\include\internal\csv_row.cpp:19
    #9 0x7ff690f42f84 in csv::internals::CSVFieldList::emplace_back<unsigned int, unsigned __int64 &>(unsigned int &&, unsigned __int64 &) csv-parser\include\internal\csv_row.hpp:89
    #10 0x7ff690f48153 in csv::internals::IBasicCSVParser::push_field(void) csv-parser\include\internal\basic_csv_parser.cpp:108
    #11 0x7ff690f479b6 in csv::internals::IBasicCSVParser::parse(void) csv-parser\include\internal\basic_csv_parser.cpp:135
    #12 0x7ff690f47425 in csv::internals::MmapParser::next(unsigned __int64) csv-parser\include\internal\basic_csv_parser.cpp:253
    #13 0x7ff690f3ef63 in csv::CSVReader::read_csv(unsigned __int64) csv-parser\include\internal\csv_reader.cpp:246
    #14 0x7ff690e9112f in std::invoke C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.40.33807\include\type_traits:1721
    #15 0x7ff690e9112f in std::thread::_Invoke<class std::tuple<bool (__cdecl csv::CSVReader::*)(unsigned __int64), class csv::CSVReader *, unsigned __int64>, 0, 1, 2>(void *) C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.40.33807\include\thread:60
    #16 0x7ffa77c01bb1  (C:\Windows\System32\ucrtbase.dll+0x180021bb1)
    #17 0x7ffa01ba48fe in __asan::AsanThread::ThreadStart(unsigned __int64) D:\a\_work\1\s\src\vctools\asan\llvm\compiler-rt\lib\asan\asan_thread.cpp:299
    #18 0x7ffa7a287343  (C:\Windows\System32\KERNEL32.DLL+0x180017343)
    #19 0x7ffa7a4826b0  (C:\Windows\SYSTEM32\ntdll.dll+0x1800526b0)

Thread T11 created by T0 here:
    #0 0x7ffa01ba6607 in __asan_wrap_CreateThread D:\a\_work\1\s\src\vctools\asan\llvm\compiler-rt\lib\asan\asan_win.cpp:167
    #1 0x7ffa77c01896  (C:\Windows\System32\ucrtbase.dll+0x180021896)
    #2 0x7ff690f3f431 in std::thread::{ctor} C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.40.33807\include\thread:93
    #3 0x7ff690f3f431 in csv::CSVReader::read_row(class csv::CSVRow &) csv-parser\include\internal\csv_reader.cpp:286
    #4 0x7ff690f398a1 in csv::CSVReader::iterator::operator++(void) csv-parser\include\internal\csv_reader_iterator.cpp:47
    #5 0x7ff690e780ea in std::_Max_element_unchecked<csv::CSVReader::iterator,<lambda_fa95be0ed0cf378516cdea9c93e188d7> > C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.40.33807\include\xutility:6770
    #6 0x7ff690e7893f in std::max_element<csv::CSVReader::iterator,<lambda_fa95be0ed0cf378516cdea9c93e188d7> > C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.40.33807\include\xutility:6783
    #7 0x7ff690e82469 in CATCH2_INTERNAL_TEST_11 csv-parser\tests\test_csv_iterator.cpp:129
    #8 0x7ff690f5b2a6 in Catch::`anonymous namespace'::TestInvokerAsFunction::invoke csv-parser.build\_deps\catch2-src\src\catch2\internal\catch_test_registry.cpp:58
    #9 0x7ff690f665fe in Catch::TestCaseHandle::invoke csv-parser.build\_deps\catch2-src\src\catch2\catch_test_case_info.hpp:116
    #10 0x7ff690f665fe in Catch::RunContext::invokeActiveTestCase(void) csv-parser.build\_deps\catch2-src\src\catch2\internal\catch_run_context.cpp:561
    #11 0x7ff690f6738c in Catch::RunContext::runCurrentTest(class std::basic_string<char, struct std::char_traits<char>, class std::allocator<char>> &, class std::basic_string<char, struct std::char_traits<char>, class std::allocator<char>> &) csv-parser.build\_deps\catch2-src\src\catch2\internal\catch_run_context.cpp:524
    #12 0x7ff690f67a56 in Catch::RunContext::runTest(class Catch::TestCaseHandle const &) csv-parser.build\_deps\catch2-src\src\catch2\internal\catch_run_context.cpp:239
    #13 0x7ff690ff801e in Catch::`anonymous namespace'::TestGroup::execute csv-parser.build\_deps\catch2-src\src\catch2\catch_session.cpp:111
    #14 0x7ff690ff9233 in Catch::Session::runInternal(void) csv-parser.build\_deps\catch2-src\src\catch2\catch_session.cpp:333
    #15 0x7ff690ff8c98 in Catch::Session::run(void) csv-parser.build\_deps\catch2-src\src\catch2\catch_session.cpp:264
    #16 0x7ff690fef56f in Catch::Session::run csv-parser.build\_deps\catch2-src\src\catch2\catch_session.hpp:41
    #17 0x7ff690fef56f in main csv-parser.build\_deps\catch2-src\src\catch2\internal\catch_main.cpp:36
    #18 0x7ff690fed5d3 in invoke_main D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:78
    #19 0x7ff690fed5d3 in __scrt_common_main_seh D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:288
    #20 0x7ffa7a287343  (C:\Windows\System32\KERNEL32.DLL+0x180017343)
    #21 0x7ffa7a4826b0  (C:\Windows\SYSTEM32\ntdll.dll+0x1800526b0)

SUMMARY: AddressSanitizer: heap-use-after-free csv-parser\include\internal\csv_row.cpp:14 in csv::internals::CSVFieldList::operator[]
Shadow bytes around the buggy address:
  0x03dad83bbfd0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x03dad83bbfe0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x03dad83bbff0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x03dad83bc000: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x03dad83bc010: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
=>0x03dad83bc020: fd fd fd fd fd fd fd fd fd fd fd fd fd[fd]fa fa
  0x03dad83bc030: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x03dad83bc040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x03dad83bc050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x03dad83bc060: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x03dad83bc070: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==5080==ABORTING

My analysis is the following:

The simplest solution is to add a join in csv::CSVReader::read_row(class csv::CSVRow &) csv-parser\include\internal\csv_reader.cpp:286 like it is done in csv::CSVReader::begin() csv-parser\include\internal\csv_reader_iterator.cpp:12 but maybe a better solution exist.

Could you check if you are OK with this is analysis? Also, could it be possible to rerun the job to see if it is random?

vincentlaucsb commented 3 weeks ago

Looks like possibly the same issue as reported in #217.

Your analysis sounds very plausible. In fact, I wrote this note when I was writing this class...

image

I overlooked the fact that std::vector, like you said, completely re-allocates items when the capacity is exceeded. While I don't directly store pointers to the contents of the vector, I can see how in rare instances, a race condition can occur where one thread tries to access a vector that is in the middle of re-shuffling its contents in memory (due to a writing thread calling push_back()).

I believe that using a container that doesn't work by continually moving its contents in memory, e.g. a linked list or std::list would be a better solution. I also believe any performance degradation would be very minor.

I will look into fixing this when I have time, but hopefully it shouldn't take long as no algorithmic changes have to be made.

vincentlaucsb commented 3 weeks ago

Thanks for your change. I've found a solution to the memory access issues by replacing std::vector with std::deque and using std::unique_ptr over manual memory manipulation.