pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.78k stars 17.97k forks source link

Replace old string formatting syntax with f-strings #29547

Closed ShaharNaveh closed 4 years ago

ShaharNaveh commented 5 years ago

Since we no longer support python 3.5, we can now use the new f-strings instead of the old .format() ( and obviously the % formatting).

Notes:


To check what files still needs to be fixed in the pandas directory:

grep -l -R '%s'  --include=*.{py,pyx} pandas/
grep -l -R '%d' --include=*.{py,pyx} pandas/
grep -l -R '\.format(' --include=*.{py,pyx} pandas/

All of the above can also be used as a one liner:

grep -l -R -e '%s' -e '%d' -e '\.format(' --include=*.{py,pyx} pandas/
Tip:

If you want to see the line number of the occurrence, replace the -l with -n for example:

grep -n -R '%s' --include=*.{py,pyx} pandas/

The current list is:


NOTE:

The list may change as files are moved/renamed constantly.


Inhereted files and commands from this PR.

3vts commented 4 years ago

These are included in #31986

"pandas/tests/scalar/timestamp/test_constructors.py" "pandas/tests/scalar/timestamp/test_rendering.py" "pandas/tests/scalar/timestamp/test_unary_ops.py" "pandas/tests/series/methods/test_nlargest.py" "pandas/tests/series/test_analytics.py" "pandas/tests/series/test_api.py" "pandas/tests/series/test_dtypes.py" "pandas/tests/series/test_ufunc.py"

panjacek commented 4 years ago

Added my first pull request with this https://github.com/pandas-dev/pandas/pull/32007: pandas/tests/frame/test_to_csv.py

3vts commented 4 years ago

These are included in #32032

"pandas/tests/test_downstream.py" "pandas/tests/test_multilevel.py" "pandas/tests/tools/test_numeric.py" "pandas/tests/tseries/frequencies/test_inference.py" "pandas/tests/tslibs/test_parse_iso8601.py" "pandas/tests/window/moments/test_moments_rolling.py"

pcandoalmeida commented 4 years ago

This file is included in #32029:

smartvinnetou commented 4 years ago

File included in https://github.com/pandas-dev/pandas/pull/32044

3vts commented 4 years ago

These are included in #32034

"pandas/core/arrays/interval.py" "pandas/core/util/hashing.py" "pandas/io/formats/format.py" "pandas/io/formats/html.py" "pandas/io/formats/latex.py" "pandas/io/formats/printing.py" "pandas/io/parsers.py" "pandas/tests/arrays/categorical/test_dtypes.py" "pandas/tests/arrays/categorical/test_operators.py"

drewseibert commented 4 years ago

These ones are done...

pandas/core/ops/invalid.py pandas/core/ops/methods.py pandas/core/ops/roperator.py

pcandoalmeida commented 4 years ago

Files included in #32063:

raisadz commented 4 years ago

Hi, I want to take scripts/validate_docstrings.py

jancervenka commented 4 years ago

File included in the pull request #32189

MasterNobikan commented 4 years ago

Is this issue resolved? I have been looking at files unmarked on the list at the top of this thread, it seems like the .format() strings have been converted

MasterNobikan commented 4 years ago

pandas//util/_decorators.py is done (the line that fails is a comment)

pandas//core/indexes/base.py should be marked off (failure is from a comment again)

smartvinnetou commented 4 years ago

Hi, I started looking at pandas/core/generic.py and quickly realised that changing the string templates from a string interpolated with % to a string.Template will require changes in many places where Substitute+Appender decorators are used.

Do you aim to remove all usage of % string interpolation, in which case this work will be necessary, or are you ok with some use of % interpolation?

ShaharNaveh commented 4 years ago

@smartvinnetou When it comes to the Appender and Substitute we are now trying to replace those with the doc decorator. see https://github.com/pandas-dev/pandas/issues/31942

smartvinnetou commented 4 years ago

@smartvinnetou When it comes to the Appender and Substitute we are now trying to replace those with the doc decorator. see #31942

@MomIsBestFriend do you prefer to skip the upgrade of pandas/core/generic.py in this ticket and do it under #31942 ? Or should I replace Appender and Subtitute decorators in generic.py with your new doc decorator under this ticket to remove the old % interpolation?

ShaharNaveh commented 4 years ago

@smartvinnetou When it comes to the Appender and Substitute we are now trying to replace those with the doc decorator. see #31942

@MomIsBestFriend do you prefer to skip the upgrade of pandas/core/generic.py in this ticket and do it under #31942 ? Or should I replace Appender and Subtitute decorators in generic.py with your new doc decorator under this ticket to remove the old % interpolation?

@smartvinnetou Under #31942 in the case of pandas/core/generic.py (If I understood correctly)

sachinh35 commented 4 years ago

Hi, found quite a lot of files that were either done or needed no changes, but were marked as not done. Just wanted to ask if this issue was solved and been accidentally not marked or you guys are still working on it. If it's not solved, I would like to contribute as well. These were some of the files that were done but not marked

ShaharNaveh commented 4 years ago

@sachinh35 I have updated the list :)

It got hard to keep track

sachinh35 commented 4 years ago

Thanks for updating the list! @MomIsBestFriend

SvoONs commented 4 years ago

I would like to contribute with #32939 for files under pandas/core/ops/. How should one handle the docstrings as for example https://github.com/pandas-dev/pandas/blob/master/pandas/core/ops/docstrings.py#L564 which are sometimes imported in other files as well? Wrapping in functions?

proost commented 4 years ago

I changed

OlivierLuG commented 4 years ago

The two files were modified: pandas/_libs/tslibs/timedeltas.pyx pandas/_libs/tslibs/timestamps.pyx

Note that there was no issues in the following ones. You can mark as done too: pandas/_libs/tslibs/c_timestamp.pyx pandas/_libs/tslibs/frequencies.pyx pandas/_libs/tslibs/parsing.pyx pandas/_libs/tslibs/period.pyx pandas/_libs/tslibs/strptime.pyx

note: this is my first PR ever. Let me know if something need to be improved.

OlivierLuG commented 4 years ago

I've went through the topic to update the list + check some files.

Files marked as done without any commit

(no need to change anything):

Remaining files to check:

matteosantama commented 4 years ago

I took care of pandas/util/_validators.py. Many of these other files already seem ok to me too.

warden706 commented 4 years ago

Hi Matteo, I'm interested in helping with this effort, but am new to git and contributing to pandas. Would you possibly be able to walk me through the steps ? Perhaps I can setup a screen share this week?

Thanks, Andrew

On Tue, May 26, 2020, 3:12 PM Matteo Santamaria notifications@github.com wrote:

I took care of pandas/util/_validators.py. Many of these other files already seem ok to me too.

  • pandas/util/_test_decorators.py
  • pandas/tseries/frequencies.py
  • pandas/tests/util/test_assert_frame_equal.py
  • pandas/tests/tslibs/test_parsing.py
  • pandas/tests/tseries/holiday/test_holiday.py
  • pandas/tests/tseries/holiday/test_calendar.py
  • pandas/tests/tools/test_to_datetime.py
  • pandas/tests/test_strings.py
  • pandas/tests/series/test_repr.py
  • pandas/tests/series/test_datetime_values.py
  • pandas/tests/series/test_constructors.py
  • pandas/tests/series/test_api.py

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/29547#issuecomment-634307953, or unsubscribe https://github.com/notifications/unsubscribe-auth/APG73XKAUDLFVVQXUS2SJETRTQ5DDANCNFSM4JLZCU5Q .

matteosantama commented 4 years ago

Hey @warden706, I'm actually pretty new here too, so I wouldn't have much to show you. I've found this resource very helpful as I've stumbled around, you should check it out.

MatteoFelici commented 4 years ago

Hi, pretty new to contributing also here. I'm taking care of

I checked these other files and seem ok to me

DanBasson commented 4 years ago

i'm also new here. i'll take

i have a question regarding the code change. for instance, in pandas/tests/series/indexing/test_take.py, snippet of the code:

 msg = "index {} is out of bounds for( axis 0 with)? size 5"
 with pytest.raises(IndexError, match=msg.format(10)):
     ser.take([1, 10])

so my suggestion is to replace it to:

msg = lambda x: f"index {x} is out of bounds for( axis 0 with)? size 5"
with pytest.raises(IndexError, match=msg(10)):
    ser.take([1, 10])

is that good enough?

MatteoFelici commented 4 years ago

Hi, I'd like to make a PR, so I'm running the test, but I'm having a couple of fails. So I tried to run tests also on master. Is it normal that running pytest pandas on unedited forked master returns a couple of fails?

matteosantama commented 4 years ago

Master should generally pass the tests. Make sure you've pulled the latest commits. Which tests are failing?

MatteoFelici commented 4 years ago

@matteosantama I pulled last commits, re-installed the environment an re-run the tests with pytest pandas. These are the results


================= short test summary info =================
FAILED pandas/tests/io/test_parquet.py::TestParquetFastParquet::test_s3_roundtrip - ValueError: Invalid timestamp "Ven, 29 Mag 2020 07:59:19 GMT": Unknown string format: Ven, 29 Mag 2020 07:59:19 GMT
FAILED pandas/tests/plotting/test_datetimelike.py::TestTSPlot::test_ts_plot_with_tz['UTC'] - AttributeError: 'numpy.datetime64' object has no attribute 'hour'
================= 2 failed, 87804 passed, 1185 skipped, 1005 xfailed, 5637 warnings in 2437.06s (0:40:37) =================

I noticed that if I run tests only on the single directory (for example with pytest pandas/tests/io), there are no fails:

 7273 passed, 344 skipped, 53 xfailed, 5584 warnings in 351.76s (0:05:51) 
MatteoFelici commented 4 years ago

Since the @OlivierLuG comment, it seems like almost all of the files have been corrected or were already ok without any modification. I'll try to update the list about the "still open" files.

Corrected

No need to modification

Moreover, I think that also this is already ok

Still to check/correct

@DanBasson do you have any update?

DanBasson commented 4 years ago

i keep getting errors which i don't know what they mean. any help will be appreciated

MatteoFelici commented 4 years ago

Have you tried to fetch the latest modifications on master? Maybe it will fix some of the failed tests.

DanBasson commented 4 years ago

it didn't help. if someone else wants to take it, you can

MatteoFelici commented 4 years ago

I have a doubt: when we have a situation like in pandas/tests/reshape/test_melt.py:

msg = "The following '{Var}' are not present in the DataFrame: {Col}"
...
with pytest.raises(KeyError, match=msg.format(Var="value_vars", Col="\\['C'\\]")):
...
with pytest.raises(KeyError, match=msg.format(Var="id_vars", Col="\\['A'\\]")):
...

and so on, should we transform msg to a function and call it with different values of "Col"? Or is it better to leave it as it is?

WillAyd commented 4 years ago

@MatteoFelici thanks for that updated list. I checked the last few remaining modules you called out and this looks OK, so I think we can close this issue