nickgrasley / ml-record-linking

Code for linking census records
0 stars 0 forks source link

record_db get_records fails for large number of input indices #12

Closed benbusath closed 4 years ago

benbusath commented 4 years ago

calling get_records throws following SQL error for queries of about 50,000 people or more:

message: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]The query processor ran out of internal resources and could not produce a query plan. This is a rare event and only expected for extremely complex queries or queries that reference a very large number of tables or partitions. Please simplify the query. If you believe you have received this message in error, contact Customer Support Services for more information.

Could possibly read them in chunk-wise or create temporary table with desired index values to merge with.

benbusath commented 4 years ago

I created a chunksize feature by breaking up the uids list into chunks and feeding each individual chunk into its own SQL query. However, this method takes very long and still throws resources error for very small chunksizes. I'm going to try uploading the indices to a temporary SQL table and merge onto that instead.

benbusath commented 4 years ago

completed get_records through temporary table merge functionality