mapping-commons / sssom-py

Python toolkit for SSSOM mapping format
https://mapping-commons.github.io/sssom-py/index.html#
MIT License
48 stars 10 forks source link

Addressed situation when assign_default_confidence() returns only dataframe with all NaN confidence values #548

Closed hrshdhgd closed 1 month ago

hrshdhgd commented 1 month ago

Ok, so here was the problem:

When the dataframe whose redundant rows had to be filtered out had all NaN values for confidence, the line

https://github.com/mapping-commons/sssom-py/blob/550206721911f711ee678eb1a8da50591649bd04/src/sssom/util.py#L441

returned df = Empty dataframe and the entire source data frame = nan_df.

Due to this, the following line:

https://github.com/mapping-commons/sssom-py/blob/550206721911f711ee678eb1a8da50591649bd04/src/sssom/util.py#L447

result in dfmax = {} which is of type pandas.Series. Hence the confusion.

The correct way to handle this is simple adding an if statement:

https://github.com/mapping-commons/sssom-py/blob/ffa2109616020f994196cbb827d71bca17192014/src/sssom/util.py#L447-L469

I've added an explicit test and it passes. Fixes #546