OutOfMemoryException when parsing large files

CsvQuery gets OutOfMemoryException even though the files are not that big (e.g. 50MB file).

The problem is the conversion of text from Scintilla to C# - right now the rather inefficient Encoding.GetString(byte[]) is used, and it first copies the memory from N++ into a byte[] (which in turn makes N++ to first reorganize it's memory so it's linear).

Crash dump: OutOfMemory.txt

Possible ways to fix it:

Chunked reading - read the text in smaller chunks. Easy to implement, just make sure to remember that UTF-8 chars might span the border between chunks.

Unmanaged reading using pointers Scintilla can give us a pointer to the text instead of copying it into a byte[]. However, that's a pointer into UTF-8 data, which C# (or Windows) doesn't have any good functions to read. Pro: Probably much faster Cons: Not managed to find a way to actually do it without copying the data several times (ideally only a single copy from the Scintilla pointer to the C# String internal widechar data should be enough).

jokedst / CsvQuery

OutOfMemoryException when parsing large files #7