govert / SQLiteForExcel

A lightweight wrapper to give access to the SQLite3 library from VBA.
MIT License
256 stars 81 forks source link

far slower reading data from cursor in Office 365 due to character encoding #11

Open gburns53 opened 4 years ago

gburns53 commented 4 years ago

My company recently updated to office 365 and all my tools got vastly slower using this. Specifically running the DLL api functions to step thru the cursor, read the datatype and convert is many times slower. It's slightly less noticeable when data is only numeric, but I believe it's still slower. I believe this problem is related to the text encoding.

IOW, sqlite db works just as fast, but when you return a lot of data, it takes a lot longer to read it in to an array stepping thru the cursor (so to speak, using the step function). I'm not sure what is going on. I haven't found a fix yet, but it's sad because you can write such amazing solutions using this .. I read all excel tables in automatically to a database in memory as they're added. You'd never even know you basically have an excel db in RAM and I wrote my own sql editor with VBA forms.

govert commented 4 years ago

I think the part that is slower in newer Excel versions is some of the COM object model interactions with Excel. So if you are reading or writing cell-by-cell, that might be slower, and some interactions are much slower. See this answer to a similar question https://github.com/Excel-DNA/ExcelDna/issues/281#issuecomment-565230403 where I said:

There were quite a few performance-related issues in the Excel 2010 -> Excel 2013 time frame, and I think you'll find some COM interop scenarios became much slower.

So I don't think this is related to SQLiteForExcel directly and the VBA side. Maybe you can do some tests where there is no use of the Excel COM object model, just VBA code. I'm also not sure if the debug output in VBA has reliable performance across versions.

gburns53 commented 4 years ago

Thx. I have. I'm talking about reading in to a VBA array. I can write data to a sqlite table every bit as fast, even if it's source is a worksheet. Queries with large result set's are the issue. I haven't isolated which of the 3 (highlighted), or maybe all 3, but the performance is degraded in a loop like below.

I'll do some more investigation, I thought you may have heard something.

Do While retVal = SQLITE_ROW
    r = r + 1
    For c = 0 To cols - 1
        colType = SQLite3ColumnType(stmtHandle, c)
        data(r, c + 1) = ColumnValue(stmtHandle, c, colType)
    Next c
    retVal = SQLite3Step(stmtHandle)
Loop

retVal = SQLite3Finalize(stmtHandle)

On Wed, Dec 18, 2019 at 1:14 AM Govert van Drimmelen < notifications@github.com> wrote:

I think the part that is slower in newer Excel versions is some of the COM object model interactions with Excel. So if you are reading or writing cell-by-cell, that might be slower, and some interactions are much slower. See this answer to a similar question Excel-DNA/ExcelDna#281 (comment) https://github.com/Excel-DNA/ExcelDna/issues/281#issuecomment-565230403 where I said:

There were quite a few performance-related issues in the Excel 2010 -> Excel 2013 time frame, and I think you'll find some COM interop scenarios became much slower.

-

https://social.msdn.microsoft.com/Forums/Lync/en-US/7fec88bd-d0ae-42e8-bf3b-efc7934b5cd6/excel-2013-vba-slow-performance-with-specific-commands?forum=exceldev

https://social.msdn.microsoft.com/Forums/Lync/en-US/6f633bfa-c14d-41d2-a63a-020e0418d5ba/excel-2013-poor-performance-with-rangecopy?forum=exceldev

https://social.msdn.microsoft.com/Forums/Lync/en-US/56cb00ff-9679-4f15-af5b-7b73e3117d8e/excel-2016-vba-performing-poorly-especially-bad-with-applicationfiledialog?forum=exceldev including this comment: "Does your code do any (un)protecting of worksheets or the workbook? If so, please be aware that changing protection has slowed down by a factor of as much as 2000 since the upgrade to Office 2013, due to a change in the password hash algorithm."

So I don't think this is related to SQLiteForExcel directly and the VBA side. Maybe you can do some tests where there is no use of the Excel COM object model, just VBA code. I'm also not sure if the debug output in VBA has reliable performance across versions.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/govert/SQLiteForExcel/issues/11?email_source=notifications&email_token=AKIN35IRUOOZXPBLMMI6BS3QZG5VPA5CNFSM4J4D2ARKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHE7OSY#issuecomment-566884171, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKIN35JHLAHCTVBYMPN652LQZG5VPANCNFSM4J4D2ARA .

gburns53 commented 4 years ago

Little quick follow up in testing. I believe SQLite3ColumnText() is processing much slower for some reason.

Do you think changing the character encoding of my sqlite db could help? That's about the only short term solution I can think of.

SQLite3ColumnText() is at least 4x slower than SQLite3ColumnDouble() in Office 365.

If I assume all data is double, result reads seem fairly consistent with Office 2016. If I then assume all data is text: data(r, c + 1) = SQLite3ColumnText(stmtHandle, c). It's about 4x slower returning the same 26k rows in 11 cols. About 4 secs to 1 second

Test case using 26k rows and 11 cols:

Do While retVal = SQLITE_ROW
    r = r + 1
    For c = 0 To cols - 1
        data(r, c + 1) = SQLite3ColumnDouble(stmtHandle, c)
    Next c
    retVal = SQLite3Step(stmtHandle)
Loop

On Wed, Dec 18, 2019 at 4:33 PM Guy gregg.burns@gmail.com wrote:

Thx. I have. I'm talking about reading in to a VBA array. I can write data to a sqlite table every bit as fast, even if it's source is a worksheet. Queries with large result set's are the issue. I haven't isolated which of the 3 (highlighted), or maybe all 3, but the performance is degraded in a loop like below.

I'll do some more investigation, I thought you may have heard something.

Do While retVal = SQLITE_ROW
    r = r + 1
    For c = 0 To cols - 1
        colType = SQLite3ColumnType(stmtHandle, c)
        data(r, c + 1) = ColumnValue(stmtHandle, c, colType)
    Next c
    retVal = SQLite3Step(stmtHandle)
Loop

retVal = SQLite3Finalize(stmtHandle)

On Wed, Dec 18, 2019 at 1:14 AM Govert van Drimmelen < notifications@github.com> wrote:

I think the part that is slower in newer Excel versions is some of the COM object model interactions with Excel. So if you are reading or writing cell-by-cell, that might be slower, and some interactions are much slower. See this answer to a similar question Excel-DNA/ExcelDna#281 (comment) https://github.com/Excel-DNA/ExcelDna/issues/281#issuecomment-565230403 where I said:

There were quite a few performance-related issues in the Excel 2010 -> Excel 2013 time frame, and I think you'll find some COM interop scenarios became much slower.

-

https://social.msdn.microsoft.com/Forums/Lync/en-US/7fec88bd-d0ae-42e8-bf3b-efc7934b5cd6/excel-2013-vba-slow-performance-with-specific-commands?forum=exceldev

https://social.msdn.microsoft.com/Forums/Lync/en-US/6f633bfa-c14d-41d2-a63a-020e0418d5ba/excel-2013-poor-performance-with-rangecopy?forum=exceldev

https://social.msdn.microsoft.com/Forums/Lync/en-US/56cb00ff-9679-4f15-af5b-7b73e3117d8e/excel-2016-vba-performing-poorly-especially-bad-with-applicationfiledialog?forum=exceldev including this comment: "Does your code do any (un)protecting of worksheets or the workbook? If so, please be aware that changing protection has slowed down by a factor of as much as 2000 since the upgrade to Office 2013, due to a change in the password hash algorithm."

So I don't think this is related to SQLiteForExcel directly and the VBA side. Maybe you can do some tests where there is no use of the Excel COM object model, just VBA code. I'm also not sure if the debug output in VBA has reliable performance across versions.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/govert/SQLiteForExcel/issues/11?email_source=notifications&email_token=AKIN35IRUOOZXPBLMMI6BS3QZG5VPA5CNFSM4J4D2ARKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHE7OSY#issuecomment-566884171, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKIN35JHLAHCTVBYMPN652LQZG5VPANCNFSM4J4D2ARA .

govert commented 4 years ago

Have you perhaps moved from 32-bit to 64-bit? Maybe that makes such a difference.

I do know anything about the character encoding in the SQLite db - this SQLiteForExcel code assumes it's always UTF8.

I can't really believe the VBA side is the slow part of the call - it's just converting the UTF8 bytes into a VBA string and storing into an array. But the SQLite side didn't change at all. So it doesn't really make sense to me.

gburns53 commented 4 years ago

thx. perhaps some windows update is interfering and making dll calls from vba slower.. I only run 32 bit excel. Ive seen this slow down on many users machine, but I got office 365(and win updates) much later, immediately saw a slow down and traced it to the colvalue read time on big queries.

Last night I tested from my home PC and the same slow down occurred, running office 16. This is a fast PC and the same tools ran in a blink 2 months prior. more like a 16x bottleneck is occurring.

I may have to eventually learn how to write my own dll unfortunately and pass an array back. so one call can return the result.

vba stores string in utf-16. you can create a sqlite db utf-16 in pragma settings before you begin to populate it. so I could eliminate some bottleneck and remove your utf function, however initial testing seems to indicates that will only make a modest improvement.

On Thu, Dec 19, 2019, 12:50 AM Govert van Drimmelen < notifications@github.com> wrote:

Have you perhaps moved from 32-bit to 64-bit? Maybe that makes such a difference.

I do know anything about the character encoding in the SQLite db - this SQLiteForExcel code assumes it's always UTF8.

I can't really believe the VBA side is the slow part of the call - it's just converting the UTF8 bytes into a VBA string and storing into an array. But the SQLite side didn't change at all. So it doesn't really make sense to me.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/govert/SQLiteForExcel/issues/11?email_source=notifications&email_token=AKIN35JNJRG55OCTJYEUDY3QZMDRVA5CNFSM4J4D2ARKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHIP52I#issuecomment-567344873, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKIN35MX7MSSD4DIXB5KMGTQZMDRVANCNFSM4J4D2ARA .

govert commented 4 years ago

Storing the data as UTF16 might bloat your SQLite DB, so it's not obvious that there will much of a performance improvement.

The only Windows API called for text columns is MultiByteToWideChar and I haven't yet found anything indicating that it might have changed recently.

You might try to test the performance of StringToUtf8Bytes and Utf8PtrToString pair to get some sense of how fast the text encoding and decoding is done. I'd suspect it is super fast compared to getting the bytes from SQLite.

gburns53 commented 4 years ago

wow. think it has to do with storing in xlsb. it was my very first guess but I thought I had eliminated it as a possibility a week ago. I still think something changed with excel, but anyway I got a tool to run lightening fast again on my home PC so I can figure it out. I did make other excel setting changes at the same time, so we'll see. I'll tell ya but this does look like user error of sorts. We tend to use xlsb at work so I've had tools running your dlls in that format for many months and only recently I noticed issues so I guess I'll tell ya exactly when I know.

thanks for the communication

I am only using this internally, btw

On Thu, Dec 19, 2019, 1:56 PM Govert van Drimmelen notifications@github.com wrote:

Storing the data as UTF16 might bloat your SQLite DB, so it's not obvious that there will much of a performance improvement.

The only Windows API called for text columns is MultiByteToWideChar and I haven't yet found anything indicating that it might have changed recently.

You might try to test the performance of StringToUtf8Bytes and Utf8PtrToString pair to get some sense of how fast the text encoding and decoding is done. I'd suspect it is super fast compared to getting the bytes from SQLite.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/govert/SQLiteForExcel/issues/11?email_source=notifications&email_token=AKIN35NYCDNPL2FWX6ZWCRDQZO7WVA5CNFSM4J4D2ARKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHKSQ6A#issuecomment-567617656, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKIN35LUROVW7267TZP5PPTQZO7WVANCNFSM4J4D2ARA .

gburns53 commented 4 years ago

Little follow up on why the slow down occurs. This has to do with trusted documents. If the Excel file is opened and not already specified as trusted location and trusted file type (e.g. xlsb), you get the 4x-10x slowdown from your dll calls in a query return.

It occurs across various win home and office installations on various PC s.

Other things can happen, say you have a crash and open a file from MS Office dump directory and it's not specified. The slowdown occurs..

I don't know if this is something new, and I'm not sure how to completely control it for end users, but it's definitely an issue that impacts Sqlite4Excel

just fyi. Thanks again

On Thu, Dec 19, 2019, 6:45 PM Guy gregg.burns@gmail.com wrote:

wow. think it has to do with storing in xlsb. it was my very first guess but I thought I had eliminated it as a possibility a week ago. I still think something changed with excel, but anyway I got a tool to run lightening fast again on my home PC so I can figure it out. I did make other excel setting changes at the same time, so we'll see. I'll tell ya but this does look like user error of sorts. We tend to use xlsb at work so I've had tools running your dlls in that format for many months and only recently I noticed issues so I guess I'll tell ya exactly when I know.

thanks for the communication

I am only using this internally, btw

On Thu, Dec 19, 2019, 1:56 PM Govert van Drimmelen < notifications@github.com> wrote:

Storing the data as UTF16 might bloat your SQLite DB, so it's not obvious that there will much of a performance improvement.

The only Windows API called for text columns is MultiByteToWideChar and I haven't yet found anything indicating that it might have changed recently.

You might try to test the performance of StringToUtf8Bytes and Utf8PtrToString pair to get some sense of how fast the text encoding and decoding is done. I'd suspect it is super fast compared to getting the bytes from SQLite.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/govert/SQLiteForExcel/issues/11?email_source=notifications&email_token=AKIN35NYCDNPL2FWX6ZWCRDQZO7WVA5CNFSM4J4D2ARKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHKSQ6A#issuecomment-567617656, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKIN35LUROVW7267TZP5PPTQZO7WVANCNFSM4J4D2ARA .

govert commented 4 years ago

Thank you @gburns53 for posting back - that's very interesting and surprising.

gburns53 commented 4 years ago

FYI, you think this is it? As in what i was seeing with slower .dll calls.. when document not trusted. (Even sharepoint can't be a trusted location in most business situations)

https://outflank.nl/blog/2019/04/17/bypassing-amsi-for-vba/

On Tue, Jan 7, 2020 at 3:20 PM Govert van Drimmelen < notifications@github.com> wrote:

Thank you @gburns53 https://github.com/gburns53 for posting back - that's very interesting and surprising.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/govert/SQLiteForExcel/issues/11?email_source=notifications&email_token=AKIN35NP5Y6PVGFCD3MXZC3Q4TPXPA5CNFSM4J4D2ARKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIKENYI#issuecomment-571754209, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKIN35IRY3KYV73G64DNZOTQ4TPXPANCNFSM4J4D2ARA .

gburns53 commented 4 years ago

Anyway, if you see a quick and obvious way to remove this AMSI problem via this blog, it would be nice to know. Plenty of users still hit a substantial slow down from sqlite dll calls when opening files from Sharepoint. A return requires a ton of .dll calls to step cursor-column type-utf-16 convert (up to 3 for every data point in a result set * 2 because of your API wrapper)

On Fri, Mar 27, 2020 at 6:30 PM Guy gregg.burns@gmail.com wrote:

FYI, you think this is it? As in what i was seeing with slower .dll calls.. when document not trusted. (Even sharepoint can't be a trusted location in most business situations)

https://outflank.nl/blog/2019/04/17/bypassing-amsi-for-vba/

On Tue, Jan 7, 2020 at 3:20 PM Govert van Drimmelen < notifications@github.com> wrote:

Thank you @gburns53 https://github.com/gburns53 for posting back - that's very interesting and surprising.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/govert/SQLiteForExcel/issues/11?email_source=notifications&email_token=AKIN35NP5Y6PVGFCD3MXZC3Q4TPXPA5CNFSM4J4D2ARKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIKENYI#issuecomment-571754209, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKIN35IRY3KYV73G64DNZOTQ4TPXPANCNFSM4J4D2ARA .

govert commented 4 years ago

The AMSI stuff is interesting - I had not heard of this before.

Are you suggesting the two calls to the Win32 API MultiByteToWideChar for every string is the place where the slowdown happens? I guess you could check this by making your own implementation of Utf8PtrToString which just copies the Utf8 bytes into every second location of the target string. That would not be correct for strings containing non-ANSI characters, but would be fast. You could even check if the character sequence represents and interesting Unicode character, and only call the Windows API in that case - or implement the whole thing in VBA.