Closed galipremsagar closed 3 years ago
libcudf
has a replace_nulls
API if that helps
https://docs.rapids.ai/api/libcudf/stable/group__transformation__replace.html#gad359a898c2b11e70c3e33720259c5596
@davidwendt I think the desire is to replace a value with a null, not replacing a null with a value.
This issue has been labeled inactive-30d
due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d
if there is no activity in the next 60 days.
From the description it looks like replacing a value with a null already works.
To replace a null with a value, you can use libcudf's replace_nulls
>>> libcudf.replace.replace_nulls(s._column, 'a')
<cudf.core.column.string.StringColumn object at 0x7fbf8066e710>
[
"a",
"b",
"c",
"a"
]
dtype: object
From the description it looks like replacing a value with a null already works. To replace a null with a value, you can use libcudf's
replace_nulls
>>> libcudf.replace.replace_nulls(s._column, 'a') <cudf.core.column.string.StringColumn object at 0x7fbf8066e710> [ "a", "b", "c", "a" ] dtype: object
Thanks @davidwendt , this is what is needed for scalar replace. Apologies for the confusion created by me.
Update for column-like replace: Connected offline and decided we'll do a two-pass approach if we have nulls in to_replace
: i.e., replace
+ replace_nulls
This issue has been labeled inactive-30d
due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d
if there is no activity in the next 60 days.
Describe the bug Using
replace
API we can currently replace any value of a column with<NA>
but not vice-versa.Steps/Code to reproduce bug
Expected behavior
Ideally I'd expect libcudf to replace
None
with'a'
, but is this because of a design limitation or would it be possible to relax this requirement from libcudf side and only error when there are multipleNone
values invalues_to_replace
?or
we could do a
fillna
at python level when thenull_count
is 1. and throw whennull_count
>1. But this would be a two-pass solution as we'd be doingfillna
+replace
.Environment overview (please complete the following information)
Environment details Please run and paste the output of the
cudf/print_env.sh
script here, to gather any other relevant environment detailsClick here to see environment details
Additional context Add any other context about the problem here.