mithrandyr / SimplySql

PowerShell module for querying various SQL databases
MIT License
197 stars 31 forks source link

Invoke-SqlQuery returns only records that have a unique key #146

Closed kwein123 closed 3 months ago

kwein123 commented 5 months ago

The following set of SELECT queries should each return 77 rows since there is no WHERE clause in any of the queries. Yet you can see that I get a different number of records back depending on the list of fields I am selecting, as though there was an implicit " | Unique" at the end of the line:

PS V:\Publish\KWeinrich\Storage> (invoke-sqlquery "SELECT * FROM JunctionPointTargets").count  
77
PS V:\Publish\KWeinrich\Storage> (invoke-sqlquery "SELECT Server FROM JunctionPointTargets").count
12
PS V:\Publish\KWeinrich\Storage> (invoke-sqlquery "SELECT Server,Share FROM JunctionPointTargets").count
20
PS V:\Publish\KWeinrich\Storage> (invoke-sqlquery "SELECT Server,Share,DirectoryName FROM JunctionPointTargets").count
77

This table has an index of "Server, Share, DirectoryName, FileName" - but that shouldn't matter for how many rows the above queries return. But there are only 12 unique servers, and only 20 unique Server/Share combinations.

Note: All queries return 77 records, as expected, when run in SQLyog. (I am running the queries against a MySQL 8.0.36 server, using SimplySql 2.0.2.70, Open-MySqlConnection)

mithrandyr commented 5 months ago

@kwein123 I couldn't easily reproduce via SQLite. Can you provide me a sql script that I can use to generate the table with the table so I can try and reproduce in my MySQL instance?

I would also try running the queries and storing the results in a variable and then exploring the variable. You can also replace the ( xyz).count with xyz | measure-object and see if that changes any of the results. I would also try using the -Stream switch on Invoke-SqlQuery and see if that changes the results. -- those additional tests might tell us whether this is something connected to how PowerShell itself is functioning or whether its something that DB is doing. However, I can't see how there is anything going on in the module itself (no code that would automatically do the DISTINCT function against the results).

kwein123 commented 5 months ago

With -Stream, I get all 77 records every time. I don't understand the documentation: "If the -Stream switch is used, only the first result set is returned and the output is a PSObject for each row in the result set." In which cases do I need to use -Stream? I've never used it before. Piping to measure-object still returns the reduced # of records. If this is still unclear, I'll work up an example table.

mithrandyr commented 5 months ago

@kwein123 -Stream forces the results into a PSObject instead of a DataRow -- this is helpful if you are exporting to CSV or something and do not want the extra hidden fields on a datarow object.

I think I need an example script so I can load and do some deeper testing on my side.

OilyEscuelademarco commented 4 months ago

I don't understand the problem but can confirm, with -Stream I get all records and without it, I don't. Thank you @kwein123 you have saved my sanity.

mithrandyr commented 4 months ago

@OilyEscuelademarco or @kwein123 -- can one of you provide a sample code that reproduces this issue? This will need to include the schema of the data in the database and enough data to produce the issue. I've tried multiple times but never been able to reproduce. Thanks.

kwein123 commented 3 months ago

So sorry it's taken so long to get back to you on this. I wrote this to create the MySQL table:

CREATE TABLE JunctionPointTargets2 ( SERVER VARCHAR(25), SHARE VARCHAR(25), DirectoryName VARCHAR(25), FileName VARCHAR(25), Comments TEXT, PRIMARY KEY (SERVER, SHARE, DirectoryName, FileName) );

INSERT INTO JunctionPointTargets2 (SERVER, SHARE, DirectoryName, FileName, Comments) VALUES ('A', 'A', 'A', 'A', 'A');
INSERT INTO JunctionPointTargets2 (SERVER, SHARE, DirectoryName, FileName, Comments) VALUES ('B', 'B', 'B', 'B', 'B');
INSERT INTO JunctionPointTargets2 (SERVER, SHARE, DirectoryName, FileName, Comments) VALUES ('C', 'C', 'C', 'C', 'C');
INSERT INTO JunctionPointTargets2 (SERVER, SHARE, DirectoryName, FileName, Comments) VALUES ('C', 'X', 'X', 'X', 'X');

then this for my PS code:

Import-Module SimplySQL
Open-MySQLConnection -ConnectionString "server=v2626umcth940.rtord.epa.gov;uid=ORDDashUpdate;pwd=UpdateDashORD;database=ORDFiles;SslMode=None;"
(Invoke-SqlQuery "SELECT * FROM JunctionPointTargets2").Count
(Invoke-SqlQuery "SELECT Server FROM JunctionPointTargets2").Count
(Invoke-SqlQuery -Stream "SELECT Server FROM JunctionPointTargets2").Count

Which should return a count of 4 each time, correct? But here's what I get:

PS V:\Publish\KWeinrich\Storage> .\Test-SQLQuery.ps1
4
3
4

PS V:\Publish\KWeinrich\Storage> get-module

ModuleType Version    PreRelease Name                                ExportedCommands
---------- -------    ---------- ----                                ----------------
Manifest   7.0.0.0               Microsoft.PowerShell.Management     {Add-Content, Clear-Content, Clear-Item, Clear-It…
Script     2.3.5                 PSReadLine                          {Get-PSReadLineKeyHandler, Get-PSReadLineOption, …
Binary     2.0.2.70              SimplySQL                           {Clear-SqlMessage, Close-SqlConnection, Complete-…

I'm using PowerShell 7.4.2

mithrandyr commented 3 months ago

@kwein123 -- so I tried to reproduce and I'm getting 4 each time. Can you also provide me the SQL to create the table? what happens if you just do Invoke-SqlQuery "SELECT server FROM JunctionPointTargets2" by it self? do you get 4 objects back or only 3?

FYI -- here is the full script I've used previously to try and reproduce -- can you try running this in your MySql database and see the results?

# Create the table
Invoke-SqlUpdate -Query "CREATE TABLE testForSimplySql (First text, Second int)"

#Add 6 rows
Invoke-SqlUpdate -Query "INSERT INTO testForSimplySql (First, Second) VALUES ('alpha', 1)"
Invoke-SqlUpdate -Query "INSERT INTO testForSimplySql (First, Second) VALUES ('beta', 2)"
Invoke-SqlUpdate -Query "INSERT INTO testForSimplySql (First, Second) VALUES ('delta', 3)"
Invoke-SqlUpdate -Query "INSERT INTO testForSimplySql (First, Second) VALUES ('gamma', 4)"
Invoke-SqlUpdate -Query "INSERT INTO testForSimplySql (First, Second) VALUES ('alpha', 5)"
Invoke-SqlUpdate -Query "INSERT INTO testForSimplySql (First, Second) VALUES ('epsilon', 6)"

#Verify the data, should be 6 rows
Invoke-SqlQuery -Query "SELECT * FROM testForSimplySql"

#Query just for the column with duplicate data
Invoke-SqlQuery -Query "SELECT First FROM testForSimplySql"

Invoke-SqlQuery -Query "SELECT First FROM testForSimplySql" | measure # should return 6 for count
@(Invoke-SqlQuery -Query "SELECT First FROM testForSimplySql").count # should return 6

Invoke-SqlQuery -Query "SELECT DISTINCT First FROM testForSimplySql" | measure # should return 5 for count

This is really confusing... because there is no reason for (Invoke-SqlQuery "SELECT Server FROM JunctionPointTargets2").Count to return a different count -- there is no code in SimplySql specific to that.. and the fact that the -Stream has it return the right information is fascinating....

Something else you might try, installing a version of SimplySql prior to version 2.0

# you might need to use the -Force switch...
Install-Module -Name SimplySql -RequiredVersion 1.9.1

Then import that version and try again. If the issue reproduces with both versions, then this suggests that this is somehow environmental.

You can also try Windows PowerShell 5.1

Just trying to think of different permutations that might allow us to hone in on what is causing this behavior.

kwein123 commented 3 months ago

Hmmm. I did include the MySQL to create the table, but I'm not very good with markdown. I'll try again here:

CREATE TABLE JunctionPointTargets2 ( SERVER VARCHAR(25), SHARE VARCHAR(25), DirectoryName VARCHAR(25), FileName VARCHAR(25), Comments TEXT, PRIMARY KEY (SERVER, SHARE, DirectoryName, FileName) );

INSERT INTO JunctionPointTargets2 (SERVER, SHARE, DirectoryName, FileName, Comments) VALUES ('A', 'A', 'A', 'A', 'A');
INSERT INTO JunctionPointTargets2 (SERVER, SHARE, DirectoryName, FileName, Comments) VALUES ('B', 'B', 'B', 'B', 'B');
INSERT INTO JunctionPointTargets2 (SERVER, SHARE, DirectoryName, FileName, Comments) VALUES ('C', 'C', 'C', 'C', 'C');
INSERT INTO JunctionPointTargets2 (SERVER, SHARE, DirectoryName, FileName, Comments) VALUES ('C', 'X', 'X', 'X', 'X');

I agree, this is bizarre behavior. I'm so glad that @OilyEscuelademarco had the same problem, or I'd be doubting my sanity! ;-)

In your example above, your table doesn't have a primary key. I wonder if that's a driver of this problem.

I'll try some more permutations.

kwein123 commented 3 months ago

Interesting: PowerShell 5.1.22621.2506 returns all "4s". But 7.4.2 returns 4/3/4.

kwein123 commented 3 months ago

Your code (with no primary key) runs the same with both PS versions, giving the results you expected.

mithrandyr commented 3 months ago

@kwein123 -- hrm this suggest that there might be something going on with your environment. a couple of suggestions.

Trying running the code from a different computer using PS 7.4.2 and see if you get the correct or incorrect results.

When you launch PS7, you might want to launch it with the -noprofile switch to try and get as clean a PS7 session as possible.

kwein123 commented 3 months ago

I tried running with -noprofile - no improvement. I have been testing on my Win 11 VM. At your suggestion, I used a Windows 2019 Server VM. No improvement there, either. I failed to note previously that my PowerShell 5.1 test that provided the correct results has SimplySql 1.9.0, not 2.0.2.70. When I run PowerShell 5.1 on the server, it has SimplySQL 2.0.2.70 installed. And it fails.

So I installed 1.9.1 in my PS 7 environment. If I require that version, the code works. To summarize: PS 5.1 with SS 1.9.0 - works. But with SS 2.0.2.70 - fails. PS 7.4.2 with SS 1.9.1 - works. But with SS 2.0.2.70 - fails

I have never played with this before - is there a way for me to install multiple 1.X or 2.X SimplySQL versions in parallel and test those? Or I need to install/uninstall, and iterate through versions?

mithrandyr commented 3 months ago

@kwein123 - you can install multiple versions side by side -- the Install-Module does this out of the box. however when you use Import-Module it will always load the latest version... you can change that by look at the parameters for Import-Module as there is a way to target a specific version.

Can you also try create a new mysql database and trying the logic against there? I'm very curious why my instance of MySql (granted, I'm using the MariaDB engine) can't reproduce the issue. So I'm wondering if I'm not reproducing the conditions the same in my testing (despite running the same/similar powershell commands).

If you can put together a complete script (so assume that there is no database at all, just a MySql Server.. then in the script provide all the needed commands to create the database, create the table(s) and load data into the table(s) and query to get the incorrect results) -- then I can continue testing and try to isolate what is happening.

mithrandyr commented 3 months ago

@kwein123 -- In case, you are interested -- here is how to walk through the code to see what is happening (and why I'm completely baffled by your experience).

  1. Invoke-SqlQuery
    • Line 51: this is what engages the underlying engine and interacts with the database
    • Line 57: this is what returns the results
  2. SimplySql Engine
    • Line 21: this is what retrieves the provider specific engine
    • Provider Engines are derived from a base class that provides some functionalities and allows overriding.
  3. MySqlProvider
    • Line 29: this overrides the default GetDataSet function
    • you can see that this doesn't do much at all, executes the command to fill a dataset and then returns an array of rows.
kwein123 commented 3 months ago
Import-Module SimplySQL -RequiredVersion 2.0.2.70
#Import-Module SimplySQL -RequiredVersion 1.9.1

Invoke-SqlUpdate -Query "CREATE DATABASE IF NOT EXISTS MyTester"
Invoke-SqlUpdate -Query "USE MyTester"
Invoke-SqlUpdate -Query "DROP TABLE IF EXISTS JunctionPointTargets2"
Invoke-SqlUpdate -Query "CREATE TABLE JunctionPointTargets2 ( SERVER VARCHAR(25), SHARE VARCHAR(25), DirectoryName VARCHAR(25), FileName VARCHAR(25), Comments TEXT, PRIMARY KEY (SERVER, SHARE, DirectoryName, FileName) )"

# Insert records into the JunctionPointTargets2 table

Invoke-SqlUpdate -Query "INSERT INTO JunctionPointTargets2 (SERVER, SHARE, DirectoryName, FileName, Comments) VALUES ('A', 'A', 'A', 'A', 'A')"
Invoke-SqlUpdate -Query "INSERT INTO JunctionPointTargets2 (SERVER, SHARE, DirectoryName, FileName, Comments) VALUES ('B', 'B', 'B', 'B', 'B')"
Invoke-SqlUpdate -Query "INSERT INTO JunctionPointTargets2 (SERVER, SHARE, DirectoryName, FileName, Comments) VALUES ('C', 'C', 'C', 'C', 'C')"
Invoke-SqlUpdate -Query "INSERT INTO JunctionPointTargets2 (SERVER, SHARE, DirectoryName, FileName, Comments) VALUES ('C', 'X', 'X', 'X', 'X')"

"Every select should return 4"
(Invoke-SqlQuery "SELECT * FROM JunctionPointTargets2").Count
(Invoke-SqlQuery "SELECT Server FROM JunctionPointTargets2").Count
(Invoke-SqlQuery -Stream "SELECT Server FROM JunctionPointTargets2").Count

Does that code work for you to create the environment needed to replicate the problem? It works for me. That is, if I require the 1.9.X version of SimplySQL, I get 3 "4s" (correct). If I require 2.0.2.70, I don't.

mithrandyr commented 3 months ago

@kwein123 - good news, i've been able to reproduce your experience using your sql . however my original attempt to reproduce still isn't giving an issue.. going to keep digging.

mithrandyr commented 3 months ago

OK -- if you create your table without the primary key, you will get back the proper count.

-- Table with Primary Key
CREATE TABLE test1 ( SERVER VARCHAR(25), SHARE VARCHAR(25), DirectoryName VARCHAR(25), FileName VARCHAR(25), Comments TEXT, PRIMARY KEY (SERVER, SHARE, DirectoryName, FileName) );
-- Will return only 3 rows for 'Invoke-SqlQuery "SELECT Server FROM test1"'

-- Table with Primary Key
CREATE TABLE test2 ( SERVER VARCHAR(25), SHARE VARCHAR(25), DirectoryName VARCHAR(25), FileName VARCHAR(25), Comments TEXT);
-- Will return 4 rows for 'Invoke-SqlQuery "SELECT Server FROM test2"'

So this seems to be something that the MySqlConnector may be doing -- I'll continue to dig into.

mithrandyr commented 3 months ago

I was hoping there was a solution that didn't involve removing the primary key.. no luck -- I've posted to the github for MySqlConnector hoping someone has a suggestion.

Basically, when using the MySqlConnector if you query a table that has a primary key and only leverage columns from the primary key, then it will return a distinct on those columns... almost as if there is an optimization that is query the index directly, instead of the table... So, the first -- drop your primary key (uggh) or query more/all columns. @OilyEscuelademarco , @kwein123

kwein123 commented 3 months ago

Adding fields to the query works, but so does the -Stream option you suggested, so that's the most straightforward for now.

Until MySqlConnector stops "optimizing" my query to return what it THINKS I want instead of what I asked for ;-)

bgrainger commented 3 months ago

Until MySqlConnector stops "optimizing" my query to return what it THINKS I want instead of what I asked for ;-)

MySqlConnector is an ADO.NET layer that sends the SQL to the database and returns the results. It doesn't do any "optimizing" of the query.

The results coming back from ExecuteReader are exactly what you would expect: https://github.com/mysql-net/MySqlConnector/discussions/1484#discussioncomment-9775835.

mithrandyr commented 3 months ago

@kwein123 - So I was incorrect, MySqlConnector is not 'optimizing' @bgrainger However, there is something weird with DataTable.Load(dr) when the DataReader is referencing a table with a primary key... that primarykey information is getting incorporated into the DataTable and preventing all the data from being loaded. I updated my issue on your side with code that demonstrates this.

@kwein123 -- I'm updated my own code to no longer use DataTable.Load(dataReader), this will avoid the issue. Expect an updated version to drop shortly.

mithrandyr commented 3 months ago

@kwein123 -- published, closing this issue.

kwein123 commented 2 months ago

Thanks so much for your persistence on this!