wildlife-dynamics / ecoscope

Conservation data analytics
https://ecoscope.io
BSD 3-Clause "New" or "Revised" License
22 stars 9 forks source link

`import ecoscope` takes too long #142

Closed cisaacstern closed 3 months ago

cisaacstern commented 5 months ago

Currently importing ecoscope takes ~6 seconds.

$ time python3 -c "import ecoscope"
python3 -c "import ecoscope"  6.07s user 0.94s system 80% cpu 8.673 total

If I comment out this line

https://github.com/wildlife-dynamics/ecoscope/blob/b3b9d1543eee5357ef9b2df8a4135d47d40f39d4/ecoscope/__init__.py#L1

that is reduced to 0.3 seconds:

$ time python3 -c "import ecoscope"
python3 -c "import ecoscope"  0.03s user 0.01s system 73% cpu 0.060 total

IMHO 6 seconds is too long to wait for ecoscope to import. An example of why this matters is in test development: if test session takes a minimum of 6 seconds to initialize, that adds up to a lot of developer time over the course of hundreds or thousands of test invocations.

I am going to see if I can reduce import time and open a PR for review 😃

cisaacstern commented 5 months ago

A little more detail on import time of the various components:

4 + 0.2 + 0.98 + 0.58 + 1.92 = 7.68 seconds (so these numbers check out)

cisaacstern commented 5 months ago

I am going to see if I can reduce import time and open a PR for review 😃

Note, for now to unblock development of #141, I've just commented out this line:

https://github.com/wildlife-dynamics/ecoscope/blob/818696e5ad69e216c902ac73e180bcd0753588b2/ecoscope/__init__.py#L1-L2

I can revisit a more general fix soon!

cisaacstern commented 5 months ago

Deferring import of mapclassify saves about 1.5s

diff --git a/ecoscope/analysis/speed.py b/ecoscope/analysis/speed.py
index 5873b72..825da98 100644
--- a/ecoscope/analysis/speed.py
+++ b/ecoscope/analysis/speed.py
@@ -1,7 +1,6 @@
 import typing

 import geopandas as gpd
-import mapclassify
 import pandas as pd
 import shapely

@@ -64,7 +63,16 @@ def apply_classification(x, k, cls_method="natural_breaks", multiples=[-2, -1, 1
     multiples : Listlike
         The multiples of the standard deviation to add/subtract from the sample mean to define the bins. defaults=
     """
-
+    import mapclassify
+
+    classification_methods = {
+        "equal_interval": mapclassify.EqualInterval,
+        "natural_breaks": mapclassify.NaturalBreaks,
+        "quantile": mapclassify.Quantiles,
+        "std_mean": mapclassify.StdMean,
+        "max_breaks": mapclassify.MaximumBreaks,
+        "fisher_jenks": mapclassify.FisherJenks,
+    }
     classifier = classification_methods.get(cls_method)
     if not classifier:
         return
@@ -82,12 +90,3 @@ default_speed_colors = [
     "#fc8d59",
     "#d73027",
 ]
-
-classification_methods = {
-    "equal_interval": mapclassify.EqualInterval,
-    "natural_breaks": mapclassify.NaturalBreaks,
-    "quantile": mapclassify.Quantiles,
-    "std_mean": mapclassify.StdMean,
-    "max_breaks": mapclassify.MaximumBreaks,
-    "fisher_jenks": mapclassify.FisherJenks,
-}
~
~
~
atmorling commented 4 months ago

I've tested the following:

This reduces the import time to around ~2.7 seconds. screenshot

Deferring the load of ecoscope.analysis.UD and ecoscope.mapping (removing their imports from the relevant __init__.pys) gets this down to ~1.3 seconds. This means one would have to explicitly import these modules in addition to import ecoscope .

Deferring the expensive imports isn't exactly PEP8 ApprovedTM, so I think playing around with lazy imports is worthwhile.

Also learned that tuna for visualizing import profiler outputs. (See the pretty screenshot)

cisaacstern commented 4 months ago

amazing progress!