Closed fsoubelet closed 4 months ago
To consider: maybe increase minor version instead of patch? I don't like when a patch version breaks the backwards compatibility with python/pandas packages like this.
Fully agreed.
@JoschD I suggest we drop 3.8 from our CI matrices as we did for omc3
and go forward with merging this PR and releasing a new version of the package.
The minimum required pandas
version has been out for close to a year and I expect no friction with dependencies resolving.
This PR brings changes to address the two deprecations / warnings which aren't very far away from being actual issues. I have bumped the package version as a patch, but this can be discussed.
Deprecated use of
.applymap
The call to
.applymap
intfs.writer._quote_string_columns
has been changed to.map
(a total equivalent) since.applymap
is deprecated and will be removed in (most likely) the next majorpandas
version.pandas
in the package has been upped to2.1
which is where.map
appeared (and.applymap
was deprecated).Python 3.8
is not supported bypandas 2.1
I have increasedtfs
's minimum required Python version to3.9
. Considering3.8
no longer gets active support and will stop receiving security updates within the second half of the year, I don't think this is too extreme. More thoughts on the timeline below.Deprecated use of
delim_whitespace
inpd.read_csv
when readingThe
delim_whitespace
option is now deprecated. A simple fix was to usesep="\s+"
as suggested in the issued warning. The option has been around for a long time and comes with no version / timeline considerations.FutureWarning from
pd.option_context('mode.use_inf_as_na', True)
The use of the
pd.option_context('mode.use_inf_as_na', True)
context manager intfs.frame.validate
raises aFutureWarning
which recommends to convert instead. It's not clear from the warning when this functionality will stop being supported.To circumvent this, the
.isna()
call is now made after calling.replace([np.inf, -np.inf], np.nan)
. Since.replace
returns a copy we are not modifying the provided dataframe. Comments have been added above the relevant code for developpers.According to the accepted answer in this StackOverflow thread, the performance impact should be quite small: about 0.1s on a dataframe with 1M rows (and most likely faster on newer Python versions since the
.replace
source code uses a lot of native Python). Could be compared with the runtime impact of the currentwith pd.option_context('mode.use_inf_as_na', True)
call, through the impact seems small enought that I did not benchmark.FutureWarning from
.replace([np.inf, -np.inf], np.nan)
The above change now also raises a future warning about object downcasting during the operation. A fix is done by first calling
.infer_dtypes()
on the dataframe to infer types where possible as suggested in the issued warning. This call creates a lazy copy (for copy-on-write behaviour that will come inpandas 3.0
but the following call to.replace()
creates a hard copy which erases all concerns about changing the original dataframe during validation. The.infer_dtypes()
function has been around for a long time and comes with no version / timeline considerations.Suggested Timeline
I suggest we leave this PR as a draft for now. According to the pandas repo,
pandas 3.0
is due for April 1st, 2024 and we will most likely encounter issues with the currenttfs-pandas
code at this time.By then,
Python 3.8
support will be 6 months away from complete end of life and I believe it will be reasonable to simply drop support for it ourselves (which merging this PR and publishing will do).In short: if we want to be conservative with our supported Python versions, we should keep this PR open until just before
pandas 3.0
comes around (~ April), and merge + publish a new version oftfs-pandas
then. Otherwise we can merge and publish sooner.Note on CI
The CI is currently messed up since we have
Python 3.8
in our tests matrix and the package, in this branch, can't install on this version (since we ask forpandas >= 2.1
which does not exist forPython 3.8
).If we're waiting until close to April to merge and drop
Python 3.8
support, we could at the same time remove it from our CI matrices.