pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.82k stars 17.99k forks source link

PERF: pandas' import time #16764

Closed chris-b1 closed 7 years ago

chris-b1 commented 7 years ago

I wouldn't normally be concerned about this, as of it course it only happens once, but our import time has gotten quite long, to the point I notice it hanging my ipython startup.

I don't have a good sense of what would be required to improve this, probably deferring more imports to be just in time?

on 0.20.2 - each import in a separate process

>>> import timeit
>>> timeit.timeit('import pandas', number=1)
1.0524120664765442
>>> quit()

>>> timeit.timeit('import numpy', number=1)
0.1550516492424085
>>> quit()

>>> timeit.timeit('import matplotlib', number=1)
0.24022248792225612

Below is a single process, importing deps first

>>> import timeit
>>> timeit.timeit('import matplotlib', number=1)
0.2508611853454641
>>> timeit.timeit('import numpy', number=1)
1.2033075643458346e-05
>>> timeit.timeit('import pandas', number=1)
0.840005673777485
jreback commented 7 years ago

do these in independent processes

chris-b1 commented 7 years ago

Oh, right - matplotlib one already was, updated the top for numpy.

jreback commented 7 years ago

or better yet import numpy and matplotlib first then run it

TomAugspurger commented 7 years ago

We do attempt to import matplotlib at import time. We could delay that with something like

diff --git a/pandas/plotting/__init__.py b/pandas/plotting/__init__.py
index c3cbedb0f..8f98e297e 100644
--- a/pandas/plotting/__init__.py
+++ b/pandas/plotting/__init__.py
@@ -4,12 +4,6 @@ Plotting api

 # flake8: noqa

-try:  # mpl optional
-    from pandas.plotting import _converter
-    _converter.register()  # needs to override so set_xlim works with str/number
-except ImportError:
-    pass
-
 from pandas.plotting._misc import (scatter_matrix, radviz,
                                    andrews_curves, bootstrap_plot,
                                    parallel_coordinates, lag_plot,
diff --git a/pandas/plotting/_core.py b/pandas/plotting/_core.py
index 391fa377f..9821c89c4 100644
--- a/pandas/plotting/_core.py
+++ b/pandas/plotting/_core.py
@@ -37,12 +37,7 @@ from pandas.plotting._tools import (_subplots, _flatten, table,
                                     _get_xlim, _set_ticks_props,
                                     format_date_labels)

-
-if _mpl_ge_1_5_0():
-    # Compat with mp 1.5, which uses cycler.
-    import cycler
-    colors = mpl_stylesheet.pop('axes.color_cycle')
-    mpl_stylesheet['axes.prop_cycle'] = cycler.cycler('color', colors)
+_registered = False

 def _get_standard_kind(kind):
@@ -92,6 +87,7 @@ class MPLPlot(object):
                  secondary_y=False, colormap=None,
                  table=False, layout=None, **kwds):

+        self._setup()
         self.data = data
         self.by = by

@@ -175,6 +171,20 @@ class MPLPlot(object):

         self._validate_color_args()

+    def _setup(self):
+        global _registered
+        if not _registered:
+            from pandas.plotting import _converter
+            _converter.register()
+
+            if _mpl_ge_1_5_0():
+                # Compat with mp 1.5, which uses cycler.
+                import cycler
+                colors = mpl_stylesheet.pop('axes.color_cycle')
+                mpl_stylesheet['axes.prop_cycle'] = cycler.cycler('color', colors)
+
+            _registered = True
+
     def _validate_color_args(self):
         if 'color' not in self.kwds and 'colors' in self.kwds:
             warnings.warn(("'colors' is being deprecated. Please use 'color'"

That covers all the .plot methods. Would need a decorator or something to cover the plotting methods not attached to NDFrame.

TomAugspurger commented 7 years ago

Looks like get_versions takes up about 25% of the import time for pandas.__init__.py; That could easily be delayed.

TomAugspurger commented 7 years ago

Oh, sorry, I was thinking of show_versions, not get_versions. get_versions would be a bit harder to fix... I did try out https://github.com/pypa/setuptools_scm instead of versioneer, and it worked well. May be worth looking into.

jorisvandenbossche commented 7 years ago

I did a profile with https://github.com/cournape/import-profiler

From a very quick skim:

Not huge, but those two would already remove ca 20% of the import time.

However I don't see get_versions somewhere in there, so not sure how reliable the results are.

Full output:

In [1]: from import_profiler import profile_import
   ...: 
   ...: with profile_import() as context:
   ...:     # Anything expensive in here
   ...:     import pandas
   ...: 

In [2]: context.print_info()
  cumtime (ms)    intime (ms)  name
         786.7           48.4  pandas
         196.9            3.3  +numpy
           1.8            1.8  ++_globals
           1.5            1.5  ++numpy.__config__
           1.9            1.9  ++version
           1.4            1.3  ++_import_tools
         156.9            0    ++
         156.9            3.4  +++numpy.add_newdocs
         150.9            0.8  ++++numpy.lib
         101.6            0.7  +++++type_check
          96.6            2.9  ++++++numpy.core.numeric
          93.7            1.2  +++++++numpy.core
          19              0.1  ++++++++
          18.9           18.8  +++++++++numpy.core.multiarray
           2.4            0.1  ++++++++
           2.3            2.2  +++++++++numpy.core.umath
          17.5            0    ++++++++
          17.5            2.7  +++++++++numpy.core._internal
           2.5            0.6  ++++++++++numpy.compat
           1              0    +++++++++++
           1              0.9  ++++++++++++numpy.compat._inspect
           6.2            3.4  ++++++++++ctypes
           1.5            1.5  +++++++++++_ctypes
           6              3.4  ++++++++++numerictypes
           2.3            2.2  +++++++++++numbers
           9              0    ++++++++
           8.9            2.2  +++++++++numpy.core.numeric
           6.3            1.5  ++++++++++arrayprint
           4.7            1    +++++++++++fromnumeric
           3.5            0    ++++++++++++
           3.4            3.2  +++++++++++++numpy.core._methods
           2.1            0    ++++++++
           2.1            1.7  +++++++++numpy.core.defchararray
           1.3            0    ++++++++
           1.2            1.1  +++++++++numpy.core.records
          36.4            0.1  ++++++++numpy.testing.nosetester
          36.3            0.5  +++++++++numpy.testing
          22.6            0.7  ++++++++++unittest
           3.2            0.8  +++++++++++result
           2.3            0    ++++++++++++
           2.2            2.1  +++++++++++++unittest.util
           3.7            3.4  +++++++++++case
           4.9            4.8  +++++++++++suite
           6.4            6.2  +++++++++++loader
           3.7            1.1  +++++++++++main
           2.5            0    ++++++++++++
           2.4            1.4  +++++++++++++unittest.runner
          13.1            0    ++++++++++
          13.1            2.1  +++++++++++numpy.testing.decorators
          10.9            2.2  ++++++++++++utils
           5              4.9  +++++++++++++nosetester
           3.5            3.3  +++++++++++++numpy.lib.utils
           4.2            4.1  ++++++ufunclike
          27.3            2.5  +++++index_tricks
          15.7            0    ++++++
          15.6           11.2  +++++++numpy.lib.function_base
           3.9            3.8  ++++++++numpy.lib.twodim_base
           6.8            2.7  ++++++numpy.matrixlib
           3.9            3.8  +++++++defmatrix
           2.2            2.2  ++++++numpy.lib.stride_tricks
           1.4            1.3  +++++nanfunctions
           7.1            1.4  +++++polynomial
           1              1    ++++++numpy.lib.twodim_base
           4.5            0.6  ++++++numpy.linalg
           3.4            1.1  +++++++linalg
           2.1            0    ++++++++numpy.linalg
           1.2            1.1  +++++++++numpy.linalg._umath_linalg
           4.6            1.5  +++++npyio
           1.1            1    ++++++_iotools
           2.6            2.5  +++++financial
           3.5            0    ++
           3.4            0.5  +++numpy.fft
           1.6            0.6  ++++fftpack
          10              0    ++
           9.9            0.6  +++numpy.polynomial
           3.4            1.4  ++++polynomial
           1.1            1    ++++chebyshev
           1              0.9  ++++legendre
           1.1            1    ++++hermite
           1.3            1.1  ++++hermite_e
           1.3            1.1  ++++laguerre
           7              0    ++
           7              1.2  +++numpy.random
           4.7            4.6  ++++mtrand
           2.2            0    ++
           2.2            2.1  +++numpy.ctypeslib
           6.9            0    ++
           6.8            0.6  +++numpy.ma
           4.7            0    ++++
           4.6            4.4  +++++numpy.ma.core
           1.5            0    ++++
           1.5            1.3  +++++numpy.ma.extras
           5.6            1.9  +pytz
           1.4            0.7  ++pytz.lazy
          25.5            0.8  +pandas.compat.numpy
          24.7            1.1  ++pandas.compat
           1.9            1.4  +++distutils.version
          14.6            2    +++http.client
           2              2    ++++http
          10.5            4.3  ++++ssl
           4              4    +++++ipaddress
           1.7            1.7  +++++_ssl
           5.3            0    +++dateutil
           5.2            1.6  ++++dateutil.parser
           2.9            0    +++++
           2.9            0.4  ++++++dateutil.tz
           2.5            1.5  +++++++tz
          26.2            0.4  +pandas._libs
          11.8           10.6  ++tslib
          14              1.9  ++pandas._libs.hashtable
          11.3            6.8  +++pandas._libs.lib
           2.7            2.6  ++++_decimal
          30.6            1.8  +pandas.core.config_init
           4.6            2.3  ++pandas.core.config
           2.1            0.5  +++pandas.io.formats.printing
          22.6            0.4  ++xlsxwriter
          22.2            1    +++workbook
           2.5            0.3  ++++compatibility
           1.7            1.7  +++++fractions
           9.3            5    ++++xlsxwriter.worksheet
           3.1            0.6  +++++drawing
           1.8            1.8  ++++++shape
           4              0.4  ++++xlsxwriter.packager
           1.6            0.3  ++++xlsxwriter.chart_area
           1.3            0    +++++
           1.3            1.2  ++++++xlsxwriter.chart
         369.9            0.4  +pandas.core.api
           9.9            1    ++pandas.core.algorithms
           6              0.6  +++pandas.core.dtypes.cast
           5              0.6  ++++common
           2.2            0    +++++pandas._libs
           2.2            2    ++++++pandas._libs.algos
           1.3            1.3  +++++dtypes
           2.7            0    +++pandas.core
           2.7            0.8  ++++pandas.core.common
           1.4            0.2  +++++pandas.api
           1.2            0.4  ++++++pandas.api.types
          15.4            1.1  ++pandas.core.categorical
          13.6            1.4  +++pandas.core.base
           2.9            0.3  ++++pandas.util._validators
           2.6            0.3  +++++pandas.util
           1.7            0.5  ++++++pandas.core.util.hashing
           7.6            0.8  ++++pandas.core.nanops
           6.7            0.4  +++++bottleneck
           1.7            0    ++++++
           1.7            0.4  +++++++bottleneck.slow
           1              1    ++++++reduce
           1              0.5  ++++++bottleneck.benchmark.bench
           1.7            0    ++++pandas.compat.numpy
           1.7            1.6  +++++pandas.compat.numpy.function
         330.5            6.6  ++pandas.core.groupby
          81.5            0.3  +++pandas.core.index
          81.2            0.6  ++++pandas.core.indexes.api
          27.1            2.8  +++++pandas.core.indexes.base
           4.9            0    ++++++pandas._libs
           2.3            1.8  +++++++pandas._libs.index
           2.6            1.9  +++++++pandas._libs.join
          16.2            0.9  ++++++pandas.core.ops
          15.1            0.6  +++++++pandas.core.computation.expressions
          14.5            0.3  ++++++++pandas.core.computation
          14.1            0.6  +++++++++numexpr
           4.6            4.6  ++++++++++cpuinfo
           4.7            1.2  ++++++++++numexpr.expressions
           3.5            0    +++++++++++numexpr
           3.4            3.3  ++++++++++++numexpr.interpreter
           1.4            1    ++++++++++numexpr.necompiler
           2.1            0.2  ++++++++++numexpr.tests
           1.8            1.7  +++++++++++numexpr.tests.test_numexpr
           2.4            2.2  ++++++pandas.core.strings
           2.2            2    +++++pandas.core.indexes.category
          32.4           32.2  +++++pandas.core.indexes.multi
           1.4            1.3  +++++pandas.core.indexes.interval
           1.6            1.4  +++++pandas.core.indexes.numeric
           1              0.9  +++++pandas.core.indexes.range
          11.5            1    +++++pandas.core.indexes.timedeltas
           6.3            1.8  ++++++pandas.tseries.frequencies
           4.1            2.6  +++++++pandas.tseries.offsets
           1.1            0.7  ++++++++pandas.core.tools.datetimes
           3.5            1    ++++++pandas.core.indexes.datetimelike
           2.2            1.6  +++++++pandas._libs.period
           3              1.2  +++++pandas.core.indexes.period
           1.6            1.4  ++++++pandas.core.indexes.datetimes
         235.6            7.2  +++pandas.core.frame
         161.6            3.7  ++++pandas.core.generic
           1.3            1.1  +++++pandas.core.indexing
           7.1            3.4  +++++pandas.core.internals
           3.4            1.1  ++++++pandas.core.sparse.array
           1.9            1.7  +++++++pandas._libs.sparse
         149.3            1.5  +++++pandas.io.formats.format
         147.2            0.7  ++++++pandas.io.common
           1.3            0.6  +++++++csv
         140.4            0.4  +++++++s3fs
         139.5            0.9  ++++++++core
         128              0.5  +++++++++boto3
         127.5            0.4  ++++++++++boto3.session
         116.9            0.7  +++++++++++botocore.session
          57.3            0.3  ++++++++++++botocore.configloader
           3.1            0    +++++++++++++six.moves
           3.1            3    ++++++++++++++configparser
          53.8            1.7  +++++++++++++botocore.exceptions
          52.1            0    ++++++++++++++botocore.vendored.requests.exceptions
          52              0.6  +++++++++++++++botocore.vendored.requests
          25.6            0.7  ++++++++++++++++packages.urllib3.contrib
          22.7            0.1  +++++++++++++++++botocore.vendored.requests.packages.urllib3
          22.6            0.4  ++++++++++++++++++botocore.vendored.requests.packages
          22.2            0    +++++++++++++++++++
          22.2            0.7  ++++++++++++++++++++botocore.vendored.requests.packages.urllib3
          20.2            0.7  +++++++++++++++++++++connectionpool
           1.1            1.1  ++++++++++++++++++++++exceptions
           3.8            0.4  ++++++++++++++++++++++connection
           3.3            0    +++++++++++++++++++++++util.ssl_
           3.3            0.2  ++++++++++++++++++++++++botocore.vendored.requests.packages.urllib3.util
           1.1            1    +++++++++++++++++++++++++url
          11.1            0.3  ++++++++++++++++++++++request
          10.8            0.3  +++++++++++++++++++++++filepost
           9.8            9.4  ++++++++++++++++++++++++uuid
           1.6            0.8  ++++++++++++++++++++++response
           1.1            0.9  +++++++++++++++++++++poolmanager
           2.2            1.5  +++++++++++++++++botocore.vendored.requests.packages.urllib3.contrib.pyopenssl
          20.8            0    ++++++++++++++++
          20.8            0.7  +++++++++++++++++botocore.vendored.requests.utils
           3.3            0.8  ++++++++++++++++++cgi
           2.5            0.7  +++++++++++++++++++html
           1.7            1.7  ++++++++++++++++++++html.entities
          13.6            0.4  ++++++++++++++++++compat
           4              3.2  +++++++++++++++++++urllib.request
           5.2            0    +++++++++++++++++++http
           5.2            5.1  ++++++++++++++++++++http.cookiejar
           3.3            3.3  +++++++++++++++++++http.cookies
           1.2            1.1  ++++++++++++++++++cookies
           2.3            0.8  ++++++++++++++++models
           1.1            0.5  +++++++++++++++++auth
           2.4            0.4  ++++++++++++++++api
           2.1            0    +++++++++++++++++
           2              1.1  ++++++++++++++++++botocore.vendored.requests.sessions
          12.4            2.2  ++++++++++++botocore.credentials
           8.8            0.8  +++++++++++++botocore.compat
           3.3            0    ++++++++++++++botocore.vendored
           3.2            3.1  +++++++++++++++botocore.vendored.six
           4.2            0.6  ++++++++++++++xml.etree.cElementTree
           3              1.1  +++++++++++++++xml.etree.ElementTree
           1.2            1.1  +++++++++++++botocore.utils
          41.8            1    ++++++++++++botocore.client
          29.6            0    +++++++++++++botocore
          29.6            1    ++++++++++++++botocore.waiter
           9.4            0.8  +++++++++++++++jmespath
           8.6            0.1  ++++++++++++++++jmespath
           8.6            2.1  +++++++++++++++++jmespath.parser
           3.4            0.1  ++++++++++++++++++jmespath
           3.3            1.1  +++++++++++++++++++jmespath.lexer
           2.2            1.3  ++++++++++++++++++++jmespath.exceptions
           2.3            0    ++++++++++++++++++jmespath
           2.3            0.8  +++++++++++++++++++jmespath.visitor
           1.4            0    ++++++++++++++++++++jmespath
           1.4            1.3  +++++++++++++++++++++jmespath.functions
          19.2            0.5  +++++++++++++++botocore.docs.docstring
          18.6            0.4  ++++++++++++++++botocore.docs
          18.1            0.7  +++++++++++++++++botocore.docs.service
           1.9            1.8  ++++++++++++++++++botocore.docs.utils
           3.3            0.5  ++++++++++++++++++botocore.docs.client
           2.1            0.5  +++++++++++++++++++botocore.docs.method
          10.9            0.8  ++++++++++++++++++botocore.docs.bcdoc.restdoc
           7.3            0.8  +++++++++++++++++++botocore.docs.bcdoc.docstringparser
           6.5            4.9  ++++++++++++++++++++html.parser
           1.6            1.5  +++++++++++++++++++++_markupbase
           2.3            2.2  +++++++++++++++++++botocore.docs.bcdoc.style
           1.4            0.9  +++++++++++++botocore.auth
           1.2            1    +++++++++++++botocore.awsrequest
           1.4            1.3  +++++++++++++botocore.hooks
           5.6            0.3  +++++++++++++botocore.args
           1.3            0.6  ++++++++++++++botocore.serialize
           3.3            0.3  ++++++++++++++botocore.config
           3              0.8  +++++++++++++++botocore.endpoint
           1.3            0.3  ++++++++++++++++botocore.response
           2.6            0    ++++++++++++botocore
           2.5            1.4  +++++++++++++botocore.handlers
           1              1    +++++++++++boto3.utils
           8.5            0.7  +++++++++++resources.factory
           6.6            0.4  ++++++++++++action
           4.7            0.4  +++++++++++++boto3.docs.docstring
           4.2            0.2  ++++++++++++++boto3.docs
           4              0.5  +++++++++++++++boto3.docs.service
           3.1            0.5  ++++++++++++++++boto3.docs.resource
           1.1            0.3  +++++++++++++++++boto3.docs.action
           9.9            1.1  +++++++++boto3.s3.transfer
           8.4            0.3  ++++++++++concurrent
           8.1            0.3  +++++++++++concurrent.futures
           1.4            1.3  ++++++++++++concurrent.futures._base
           6              0.7  ++++++++++++concurrent.futures.process
           2.3            0.4  +++++++++++++multiprocessing
           1.9            0    ++++++++++++++
           1.8            0.8  +++++++++++++++multiprocessing.context
           2.9            1    +++++++++++++multiprocessing.connection
           1.3            0    +++++++py.path
           1.3            0.8  ++++++++py
           3              1.9  +++++++py._path.local
          58.5            5.2  ++++pandas.core.series
           5.8            0    +++++pandas.core
           5.8            3.6  ++++++pandas.core.window
           2              1.9  +++++++pandas._libs.window
          46.3            0    +++++pandas.plotting._core
          46.2            0.6  ++++++pandas.plotting
          41              0    +++++++pandas.plotting
          40.9            1.3  ++++++++pandas.plotting._converter
          23.7            0.5  +++++++++matplotlib.units
          23.2            9    ++++++++++matplotlib
           2.5            1.5  +++++++++++distutils.sysconfig
           1              1    ++++++++++++errors
           2.9            2    +++++++++++matplotlib.cbook
           7.1            1.2  +++++++++++matplotlib.rcsetup
           2.3            2.3  ++++++++++++matplotlib.fontconfig_pattern
           2.9            1.5  ++++++++++++matplotlib.colors
           1.3            1.3  +++++++++++++_color_data
          15.3            1.3  +++++++++matplotlib.dates
           1.1            0.9  ++++++++++dateutil.rrule
          12.7            1.8  ++++++++++matplotlib.ticker
          10.9            0    +++++++++++matplotlib
          10.8            8.6  ++++++++++++matplotlib.transforms
           1.1            1    +++++++++++++path
           1.6            0.5  +++++++pandas.plotting._misc
           3              2.8  +++++++pandas.plotting._core
           7.8            0.4  ++++pandas.core.computation.eval
           6.4            3.9  +++++pandas.core.computation.expr
           1.6            0.7  ++++++pandas.core.computation.ops
           4.7            4.4  +++pandas.core.panel
           1.3            0    +++pandas._libs
           1.3            1.2  ++++pandas._libs.groupby
           2.1            1.8  ++pandas.core.panel4d
           8.9            0.9  ++pandas.core.reshape.reshape
           7              0.3  +++pandas.core.sparse.api
           3.2            2.2  ++++pandas.core.sparse.series
           3              2.9  ++++pandas.core.sparse.frame
           1.8            1.6  ++pandas.core.resample
           1.6            0.4  +pandas.stats.api
           1              1    ++pandas.stats.moments
           2.8            0.2  +pandas.core.reshape.api
           1.1            0.8  ++pandas.core.reshape.merge
          29              0.3  +pandas.io.api
           5.9            2.7  ++pandas.io.parsers
           2.6            2    +++pandas._libs.parsers
           3              1.5  ++pandas.io.excel
           1.1            1    +++pandas._libs.json
           5.8            3.5  ++pandas.io.pytables
           2              1.9  +++pandas.core.computation.pytables
           2.4            0.5  ++pandas.io.json
           1.8            1    +++json
           2              1.8  ++pandas.io.stata
           5.3            0.7  ++pandas.io.packers
           3.8            1.1  +++pandas.io.msgpack
          43.9            0.2  +pandas.util._tester
          43.7            6.1  ++pytest
           8.8            1.3  +++_pytest.config
           2.9            0.3  ++++_pytest._code
           2              1.2  +++++code
           2              0.4  ++++_pytest.hookspec
           1.5            0.2  +++++_pytest._pluggy
           1.3            1.1  ++++++_pytest.vendored_packages.pluggy
           2.3            0.5  ++++_pytest.assertion
           1.1            0    +++++_pytest.assertion
           1.1            1    ++++++_pytest.assertion.rewrite
           1.9            0.9  +++_pytest.main
           5.7            1.3  +++_pytest.python
           4.3            0    ++++_pytest
           4.3            1.8  +++++_pytest.fixtures
           1.8            1.1  ++++++py._code.code
           1.2            0.5  +++_pytest.unittest
           2.5            0.9  +++_pytest.capture
           1.4            0.7  ++++py._io.capture
           1              0.4  +++_pytest.tmpdir
          10.1            9.1  +++_pytest.junitxml
           3.3            0.3  +pandas.testing
           3              1.7  ++pandas.util.testing
           1.1            0    +++pandas._libs
           1.1            0.9  ++++pandas._libs.testing

rockg commented 7 years ago

Also see #7282, but seems like already more attention here.

jorisvandenbossche commented 7 years ago

Also see #7282, but seems like already more attention here.

It's a bit different issue, this is in general about reducing import time, the other issue is about a specific case where the import takes many seconds (but also numpy takes seconds to import, so IMO it's not pandas specific issue)