CsvQuery gets OutOfMemoryException even though the files are not that big (e.g. 50MB file).
The problem is the conversion of text from Scintilla to C# - right now the rather inefficient Encoding.GetString(byte[]) is used, and it first copies the memory from N++ into a byte[] (which in turn makes N++ to first reorganize it's memory so it's linear).
Chunked reading - read the text in smaller chunks.
Easy to implement, just make sure to remember that UTF-8 chars might span the border between chunks.
Unmanaged reading using pointers
Scintilla can give us a pointer to the text instead of copying it into a byte[]. However, that's a pointer into UTF-8 data, which C# (or Windows) doesn't have any good functions to read.
Pro: Probably much faster
Cons: Not managed to find a way to actually do it without copying the data several times (ideally only a single copy from the Scintilla pointer to the C# String internal widechar data should be enough).
CsvQuery gets OutOfMemoryException even though the files are not that big (e.g. 50MB file).
The problem is the conversion of text from Scintilla to C# - right now the rather inefficient Encoding.GetString(byte[]) is used, and it first copies the memory from N++ into a byte[] (which in turn makes N++ to first reorganize it's memory so it's linear).
Crash dump: OutOfMemory.txt
Possible ways to fix it:
Chunked reading - read the text in smaller chunks. Easy to implement, just make sure to remember that UTF-8 chars might span the border between chunks.
Unmanaged reading using pointers Scintilla can give us a pointer to the text instead of copying it into a byte[]. However, that's a pointer into UTF-8 data, which C# (or Windows) doesn't have any good functions to read. Pro: Probably much faster Cons: Not managed to find a way to actually do it without copying the data several times (ideally only a single copy from the Scintilla pointer to the C# String internal widechar data should be enough).