Closed matentzn closed 1 month ago
After the above line, we could do one of the following.
if len(df) == 0:
raise RuntimeError('No confidence values were found in dataframe. Cannot process.')
if len(df) == 0:
return pd.DataFrame()
This is what I often do in my code, but it requires downstream code to check for and handle this, and I doubt that the 9 usages of filter_redundant_rows()
would incidentally all be set up to deal with this.
Overview
https://github.com/mapping-commons/sssom-py/blob/550206721911f711ee678eb1a8da50591649bd04/src/sssom/util.py#L429
We had the problem that this was failing:
https://github.com/mapping-commons/sssom-py/blob/550206721911f711ee678eb1a8da50591649bd04/src/sssom/util.py#L449
with
AttributeError: 'Series' object has no attribute 'iterrows'
Log / traceback
``` python ../scripts/[lexmatch-sssom-compare.py](http://lexmatch-sssom-compare.py/) extract_unmapped_matches doid gard icd10cm icd10who icd11foundation ncit omim ordo \ --matches ../mappings/mondo-sources-all-lexical.sssom.tsv \ --output-dir lexmatch \ --summary lexmatch/[README.md](http://readme.md/) \ --exclusion reports/doid_term_exclusions.txt --exclusion reports/gard_term_exclusions.txt --exclusion reports/icd10cm_term_exclusions.txt --exclusion reports/icd10who_term_exclusions.txt --exclusion reports/icd11foundation_term_exclusions.txt --exclusion reports/ncit_term_exclusions.txt --exclusion reports/omim_term_exclusions.txt --exclusion reports/ordo_term_exclusions.txt /usr/local/lib/python3.10/dist-packages/sssom/[parsers.py](http://parsers.py/):428: ChainedAssignmentError: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. When using the Copy-on-Write mode, such inplace method never works to update the original DataFrame or Series, because the intermediate object on which we are setting values always behaves as a copy. For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' instead, to perform the operation inplace on the original object. df2[CONFIDENCE].replace(r"^\s*$", np.NaN, regex=True, inplace=True) /usr/local/lib/python3.10/dist-packages/sssom/[util.py](http://util.py/):168: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('[future.no](http://future.no/)_silent_downcasting', True)` df.replace("", np.nan, inplace=True) /usr/local/lib/python3.10/dist-packages/sssom/[util.py](http://util.py/):168: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('[future.no](http://future.no/)_silent_downcasting', True)` df.replace("", np.nan, inplace=True) /usr/local/lib/python3.10/dist-packages/sssom/[util.py](http://util.py/):447: FutureWarning: The provided callable is currently using np.maximum.reduce. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string np.maximum.reduce instead.
dfmax = df.groupby(key, as_index=False)[CONFIDENCE].apply(max).drop_duplicates()
/usr/local/lib/python3.10/dist-packages/sssom/[util.py](http://util.py/):447: FutureWarning: The provided callable is currently using np.maximum.reduce. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string np.maximum.reduce instead.
dfmax = df.groupby(key, as_index=False)[CONFIDENCE].apply(max).drop_duplicates()
/usr/local/lib/python3.10/dist-packages/sssom/[util.py](http://util.py/):447: FutureWarning: The provided callable is currently using np.maximum.reduce. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string np.maximum.reduce instead.
dfmax = df.groupby(key, as_index=False)[CONFIDENCE].apply(max).drop_duplicates()
/usr/local/lib/python3.10/dist-packages/sssom/[util.py](http://util.py/):447: FutureWarning: The provided callable is currently using np.maximum.reduce. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string np.maximum.reduce instead.
dfmax = df.groupby(key, as_index=False)[CONFIDENCE].apply(max).drop_duplicates()
/usr/local/lib/python3.10/dist-packages/sssom/[util.py](http://util.py/):447: FutureWarning: The provided callable is currently using np.maximum.reduce. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string np.maximum.reduce instead.
dfmax = df.groupby(key, as_index=False)[CONFIDENCE].apply(max).drop_duplicates()
/usr/local/lib/python3.10/dist-packages/sssom/[util.py](http://util.py/):447: FutureWarning: The provided callable is currently using np.maximum.reduce. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string np.maximum.reduce instead.
dfmax = df.groupby(key, as_index=False)[CONFIDENCE].apply(max).drop_duplicates()
Traceback (most recent call last):
File "/work/src/ontology/../scripts/[lexmatch-sssom-compare.py](http://lexmatch-sssom-compare.py/)", line 403, in
main()
File "/usr/local/lib/python3.10/dist-packages/click/[core.py](http://core.py/)", line 1157, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/click/[core.py](http://core.py/)", line 1078, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.10/dist-packages/click/[core.py](http://core.py/)", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.10/dist-packages/click/[core.py](http://core.py/)", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.10/dist-packages/click/[core.py](http://core.py/)", line 783, in invoke
return __callback(*args, **kwargs)
File "/work/src/ontology/../scripts/[lexmatch-sssom-compare.py](http://lexmatch-sssom-compare.py/)", line 190, in extract_unmapped_matches
unmapped_ont_df = get_unmapped_df(
File "/work/src/ontology/../scripts/[lexmatch-sssom-compare.py](http://lexmatch-sssom-compare.py/)", line 299, in get_unmapped_df
filtered_new_df = filter_redundant_rows(new_df)
File "/usr/local/lib/python3.10/dist-packages/sssom/[util.py](http://util.py/)", line 449, in filter_redundant_rows
for _, row in dfmax.iterrows():
File "/usr/local/lib/python3.10/dist-packages/pandas/core/[generic.py](http://generic.py/)", line 6299, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'Series' object has no attribute 'iterrows'
make[1]: *** [mondo-ingest.Makefile:447: lexmatch/[README.md](http://readme.md/)] Error 1
rm imports/ro_terms_combined.txt
make[1]: Leaving directory '/work/src/ontology'
make: *** [mondo-ingest.Makefile:333: build-mondo-ingest] Error 2
Command exited with non-zero status 2
```
I patched this case here: https://github.com/monarch-initiative/mondo-ingest/pull/581, basically adding some dummy confidence values to the data frame.
Again, the case was: there was a
confidence
column, but with no "legal" float values in there.Action items