rapidfuzz / RapidFuzz

Rapid fuzzy string matching in Python using various string metrics
https://rapidfuzz.github.io/RapidFuzz/
MIT License
2.71k stars 119 forks source link

sqlalchemy 2.0.4 error when using process.extract #311

Closed blacksteel1288 closed 1 year ago

blacksteel1288 commented 1 year ago

Hi,

When I have SQLAlchemy 2.0.4 installed in my venv, and I run my process.extract, I see this error:

  File "lib/sqlalchemy/cyextension/resultproxy.pyx", line 67, in sqlalchemy.cyextension.resultproxy.BaseRow.__getitem__
TypeError: tuple indices must be integers or slices, not str

However, if I rollback the SQLAlchemy version to 1.4.46, there is no error. Everything works fine.

my process.extract statement looks like this:

for data in filtered_data:
    result = process.extract(data['name'], data_df.name, scorer=fuzz.WRatio, score_cutoff=score_limit)

Its within a loop of the values for 'data' that are the result of a SQLAlchemy query, and data_df is a pandas dataframe of values that were populated by another/different SQL Alchemy query. e.g.

data_df = pd.DataFrame(some__other_query, columns=['id', 'name'])

maxbachmann commented 1 year ago

Do you have a minimal reproducible example I can just copy and run?

blacksteel1288 commented 1 year ago

Apologies. In the process of creating a smaller running example, I answered my own question.

It turns out this is a SQLAlchemy migration bug, that just appeared in this process.extract, and doesn't have anything to do with rapidfuzz.

With SQLAlchemy 2.0.4, you need to use this format:

for data in filtered_data:
    result = process.extract(data.name, data_df.name, scorer=fuzz.WRatio, score_cutoff=score_limit)

i.e. data['name'] must change to data.name

I hope this helps someone else that runs into the same problem.