pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.28k stars 17.8k forks source link

typos #59640

Closed musvaage closed 2 weeks ago

musvaage commented 3 weeks ago

left for someone else to determine replacement strings

ethnicsn and explicing

string ethnicsn also present in pandas/tests/io/data/stata/stata15.dta

$ sed -n '1591,1592p' pandas/pandas/tests/io/test_stata.py
Value labels for column ethnicsn are not unique. These cannot be converted to
pandas categoricals.
$ sed -n '111,113p' pandas/pandas/tests/plotting/test_hist_method.py
        # _check_plot_works adds an `ax` kwarg to the method call
        # so we get a warning about an axis being cleared, even
        # though we don't explicing pass one, see GH #13188
$

don't know if there is any purpose in fixing these

salaraies and thumnail

$ sed -n '39,42p' pandas/pandas/tests/io/parser/test_network.py
    # test reading compressed urls with various engines and
    # extension inference
    if compression_only == "tar":
        pytest.skip("TODO: Add tar salaraies.csv to pandas/io/parsers/data")
$ sed -n '52p' pandas/web/pandas/index.html
                                            <img class="img-fluid img-thumnail py-5 mx-auto" alt="{{ company.name }}" src="{{ base_url }}{{ company.logo }}"/>
$

the shell script is authorised for use and/or modification by the repository maintainers

$ cat typos.sh
#!/bin/sh

sed -i "s/Exceptionn/Exception/g" pandas/pandas/errors/__init__.py
sed -i "s/PERMISSOIN/PERMISSION/g" pandas/pandas/tests/io/xml/test_to_xml.py
sed -i "s/Pre-emptively/Preemptively/g" pandas/pandas/core/internals/construction.py
sed -i "s/accientally/accidentally/g" pandas/pandas/core/dtypes/cast.py
sed -i "s/anonther/another/g" pandas/pandas/tests/extension/base/dtype.py
sed -i "s/behaviorof/behavior of/g" pandas/pandas/core/arraylike.py
sed -i "s/behavivor/behavior/g" pandas/pandas/core/internals/blocks.py
sed -i "s/belows/below/g" pandas/asv_bench/benchmarks/indexing_engines.py
sed -i "s/concatanated/concatenated/g" pandas/pandas/io/formats/style.py
sed -i "s/concatentation/concatenation/g" pandas/pandas/core/reshape/concat.py
sed -i "s/determinint/determining/g" pandas/pandas/core/internals/managers.py
sed -i "s/elswhere/elsewhere/g" pandas/pandas/_libs/tslibs/np_datetime.pxd
sed -i "s/enforrced/enforced/g" pandas/pandas/tests/indexes/datetimes/test_constructors.py
sed -i "s/explicily/explicitly/g" pandas/pandas/core/arrays/string_arrow.py
sed -i "s/githubs/github's/g" pandas/pandas/_version.py
sed -i "s/herely/here/g" pandas/pandas/_libs/tslibs/timestamps.pyx
sed -i "s/horrendeous/horrendous/g" pandas/web/pandas/pdeps/0010-required-pyarrow-dependency.md
sed -i "s/increaes/increases/g" pandas/pandas/tests/frame/test_api.py
sed -i "s/indxed/indexed/g" pandas/pandas/tests/apply/test_numba.py
sed -i "s/inherrently/inherently/g" pandas/pandas/io/formats/style_render.py
sed -i "s/interwined/intertwined/g" pandas/pandas/tests/frame/methods/test_rank.py
sed -i "s/lauout/layout/g" pandas/pandas/tests/plotting/frame/test_frame_subplots.py
sed -i "s/notibly/notably/g" pandas/web/pandas/pdeps/0010-required-pyarrow-dependency.md
sed -i "s/maintaine/maintain/g" pandas/pandas/_typing.py
sed -i "s/mangel/mangle/g" pandas/pandas/tests/test_aggregation.py
sed -i "s/mediam/median/g" pandas/pandas/core/frame.py
sed -i "s/multiplpy/multiply/g" pandas/pandas/tests/indexing/test_indexing.py
sed -i "s/nsmalles/nsmallest/g" pandas/pandas/_typing.py
sed -i "s/n_largest/nlargest/g" pandas/pandas/_typing.py
sed -i "s/permutated/permuted/g" pandas/pandas/io/pytables.py
sed -i "s/pickleable/picklable/g" pandas/pandas/_libs/tslibs/offsets.pyx
sed -i "s/pre-emptive/preemptive/g" pandas/pandas/core/frame.py
sed -i "s/pre-empts/preempts/g" pandas/pandas/tests/extension/base/io.py
sed -i "s/prescibes/prescribes/g" pandas/web/pandas/community/ecosystem.md
sed -i "s/punctuations/punctuation/g" pandas/pandas/core/frame.py
sed -i "s/recognied/recognized/g" pandas/pandas/tests/dtypes/test_inference.py
sed -i "s/reflectd/reflected/g" pandas/pandas/core/arrays/base.py
sed -i "s/rentention/retention/g" pandas/pandas/tests/indexes/datetimes/test_arithmetic.py
sed -i "s/representaton/representation/g" pandas/pandas/_libs/tslibs/timestamps.pyx
sed -i "s/representaton/representation/g" pandas/pandas/_libs/tslibs/nattype.pyx
sed -i "s/requireds/required/g" pandas/pandas/tests/io/parser/test_header.py
sed -i "s/repondents/respondents/g" pandas/web/pandas/community/blog/2019-user-survey.md
sed -i "s/responsibelf or/responsible for/g" pandas/pandas/core/indexes/base.py
sed -i "s/revrse/reverse/g" pandas/pandas/tests/io/formats/style/test_matplotlib.py
sed -i "s/setpember/september/g" pandas/web/pandas/pdeps/0012-compact-and-reversible-JSON-interface.md
sed -i "s/signaure/signature/g" pandas/pandas/core/generic.py
sed -i "s/simultaneouly/simultaneously/g" pandas/pandas/io/formats/style_render.py
$ 

these can be added according to maintainer discretion

sed -i "s/Jorurnals/Journals/g" pandas/doc/source/user_guide/io.rst
sed -i "s/experimential/experimental/g" pandas/doc/source/user_guide/io.rst
sed -i "s/extremeties/extremities/g" pandas/doc/source/user_guide/style.ipynb
sed -i "s/intrday/intraday/g" pandas/doc/source/user_guide/cookbook.rst
sed -i "s/locallly/locally/g" pandas/doc/source/development/debugging_extensions.rst
sed -i "s/offsetes/offsets/g" pandas/doc/source/whatsnew/v2.0.0.rst
sed -i "s/passinig/passing/g" pandas/doc/source/whatsnew/v1.0.0.rst
sed -i "s/pickleable/picklable/g" pandas/doc/source/whatsnew/v0.21.1.rst
sed -i "s/retension/retention/g" pandas/doc/source/development/contributing_codebase.rst
sed -i "s/transferrable/transferable/g" pandas/doc/source/getting_started/index.rst
sed -i "s/work-arounds/workarounds/g" pandas/doc/source/whatsnew/v0.25.0.rst

cf: #59651

$ grep ethnicsn pandas/pandas/tests/io/test_stata.py
Value labels for column ethnicsn are not unique. These cannot be converted to
$ 

@asishm

As the test failures show, you can't really change the error message without changing the columns in the corresponding stata file.

$ grep ETHNICSN pandas/pandas/tests/io/data/stata/stata15.dta
grep: pandas/pandas/tests/io/data/stata/stata15.dta: binary file matches
$ grep ethnicsn pandas/pandas/tests/io/data/stata/stata15.dta
grep: pandas/pandas/tests/io/data/stata/stata15.dta: binary file matches

République du Sénégal

Code ISO 3166-1 SEN, SN

Domaine Internet .sn

musvaage commented 2 weeks ago

@mroeschke

Of course I can open a draft PR but suppose some files might not actually be slated for modification.

mroeschke commented 2 weeks ago

A PR to actually commit the changes of this script would be accepted. From a quick skim most of those spelling mistakes look OK for correcting

musvaage commented 2 weeks ago

@mroeschke

this also caught my attention

$ grep -n precated pandas/pandas/tests/config/test_config.py
16:            m.setattr(cf, "_deprecated_options", {})
96:        assert "precated" in cf.describe_option("b", _print_desc=False)
104:        assert "precated" in cf.describe_option("g.h", _print_desc=False)
128:        msg = "'kanban' is deprecated, please refrain from using it."
272:        with tm.assert_produces_warning(FutureWarning, match="deprecated"):
281:        with tm.assert_produces_warning(FutureWarning, match="eprecated.*nifty_ver"):
284:            msg = "Option 'a' has already been defined as deprecated"
299:        with tm.assert_produces_warning(FutureWarning, match="eprecated"):
302:        with tm.assert_produces_warning(FutureWarning, match="eprecated"):
305:        with tm.assert_produces_warning(FutureWarning, match="eprecated"):
$ 
musvaage commented 2 weeks ago

@mattwang44

pandas/tests/config/test_config.py

Is the sed grep output from my previous comment typo free?