mkleehammer / pyodbc

Python ODBC bridge
https://github.com/mkleehammer/pyodbc/wiki
MIT No Attribution
2.89k stars 562 forks source link

fast_execute many stops after x inserts #1053

Closed CaptainCsaba closed 2 years ago

CaptainCsaba commented 2 years ago

Environment

Issue

Hello. I have a stored procedure for inserting 1 object into my "RankingsScrapedCompany" table. I am calling this on the python side with the fast_executemany parameter set to True. Inserting around 1500 rows never goes though, for the same dataset around 900 - 1100 are inserted. Since the data is ordered alhabetically I can see that the not inserted rows are after a certain point. So everything is inserted in order and then after that point nothing is inserted. I have read somewhere that there is a cap at 2100 parameters or such, I just want to get some clarification as to what I need to change in the logic in my code to make sure everything gets inserted.

The Table and stored procedure:

CREATE TABLE RankingsScrapedCompany
(
    Id INT IDENTITY PRIMARY KEY,
    Ticker NVARCHAR(25) NOT NULL,
    Name NVARCHAR(255) NOT NULL,
    ISIN NVARCHAR(15),
    SubSectorCode INT,
    MarketSegmentCode NVARCHAR(15),
    IsSuspended BIT,
    [Index] NVARCHAR(30),
    Sector NVARCHAR(255),
    SubSector NVARCHAR(255),
    LastChanged DATETIME NOT NULL
)

CREATE PROCEDURE spInsertScrapedCompany
(
    @Ticker NVARCHAR(25),
    @Name NVARCHAR(255),
    @ISIN NVARCHAR(15),
    @SubSectorCode INT,
    @MarketSegmentCode NVARCHAR(15),
    @IsSuspended BIT,
    @Index NVARCHAR(30),
    @Sector NVARCHAR(255),
    @SubSector NVARCHAR(255)
)
AS
    SET NOCOUNT ON;

    INSERT INTO RankingsScrapedCompany (
        Ticker,
        Name,
        ISIN,
        SubSectorCode,
        MarketSegmentCode,
        IsSuspended,
        [Index],
        Sector,
        SubSector,
        LastChanged
    )
    VALUES (    
        @Ticker,
        @Name,
        @ISIN,
        @SubSectorCode,
        @MarketSegmentCode,
        @IsSuspended,
        @Index,
        @Sector,
        @SubSector,
        GETDATE()
    )

    SELECT SCOPE_IDENTITY()

GO

The python part where I do the inserts. param_values is a list of tuples with the data I am inserting:

query = "EXEC spInsertScrapedCompany @Ticker=?, @Name=?, @ISIN=?, @SubSectorCode=?, @MarketSegmentCode=?, @IsSuspended=?, @Index=?, @Sector=?, @SubSector=?"
with self._create_connection() as connection:
    with connection.cursor() as cursor:
        cursor.fast_executemany = True
        result = cursor.executemany(query, param_values)
v-chojas commented 2 years ago

The 2100 parameter limit is for a single parameter row; the number of rows is not limited in that way. Does the row that isn't inserted have values that won't fit into the table (e.g. string fields too long, or perhaps a non-digit in an integer column)? Is there an error returned? You can post an ODBC trace for further analysis.

CaptainCsaba commented 2 years ago

As of now I can only say that the behavior is strange. I have dropped and recreated the table and inserted the same 1400 lines 5 times in a row. Every time the inserted amount differed by a lot. First it managed to do 765, then 569, 582, 719 and 622. I was stopping the code from running in VS Code and using the debugging window to make sure I am inserting the same data stored inside a variable. I've then gone ahead and checked the next line where the inserts ended, nothing out of the ordinary, not to mention that sometimes the insert happened as seen above, other times the tool stopped there. In no cases was data types, nullability, overflows, character limits a problem. No error message was shown to me and this part is not inside a try-except block. It's quite hard to get an ODBC trace due to company laptop limitations but I'll try to do my best.

Edit: I've gone ahead and inserted the next line each time after the bulk inserts which should have been inserted if the tool havent stopped. There were no issues and the line was inserted without problems. This definitely seems like a bug to me.

CaptainCsaba commented 2 years ago

I have tried to insert the same 1400 lines but now in batches of 100 using the same code as above. This way everything went through. Could this be a problem that the Server can't handle bigger than N amount of async queries?

v-chojas commented 2 years ago

Could you try putting a SET NOCOUNT ON before your inserts? I suspect this could be the result of not consuming resultsets the server generates for inserts: https://github.com/mkleehammer/pyodbc/issues/262 https://github.com/mkleehammer/pyodbc/issues/277

CaptainCsaba commented 2 years ago

Its' at the beginning of each stored procedure I call as shown above. I call the same procedure 1400 times with fast_executemany. Is putting it outside before running all of them makes a difference?

v-chojas commented 2 years ago

The SELECT SCOPE_IDENTITY() will still generate a resultset. Put a loop to consume the resultsets after your execution. (See discussion in the issues I linked above.)

gordthompson commented 2 years ago

duplicate of #665