oceanhackweek / jupyter-image

jupyter-image
MIT License
2 stars 7 forks source link

Finding out what packages are exploding the build #24

Open abkfenris opened 3 years ago

abkfenris commented 3 years ago

Here's a way to start analyzing what packages are causing the build to explode.

import json
from pathlib import Path
import pandas as pd

pkg_files = Path("/opt/conda/conda-meta/").glob("*.json")

paths = []

for pkg_file in pkg_files:
    with pkg_file.open() as f:
        pkg = json.load(f)
        paths += pkg["paths_data"]["paths"]

df = pd.DataFrame(paths)
df = df.drop(
    [
        "path_type",
        "sha256",
        "sha256_in_prefix",
        "no_link",
        "file_mode",
        "prefix_placeholder",
    ],
    axis=1,
)
df = df.dropna()
df = df.sort_values("size_in_bytes", ascending=False)
 $ df.head(20)

                                                    _path  size_in_bytes
37480   lib/python3.9/site-packages/tensorflow/python/...    271301744.0
25649                                    lib/libavcodec.a    152533588.0
25258                                   lib/libLLVM-11.so    105929424.0
74839   x86_64-conda-linux-gnu/sysroot/usr/lib64/local...     99188496.0
85596                                    lib/librsvg-2.so     97415432.0
85597                                  lib/librsvg-2.so.2     97415432.0
85598                             lib/librsvg-2.so.2.47.0     97415432.0
91707                                   lib/libLLVM-10.so     95685352.0
50318                        lib/libQt5WebEngineCore.so.5     92408776.0
50320                   lib/libQt5WebEngineCore.so.5.12.9     92408776.0
50317                          lib/libQt5WebEngineCore.so     92408776.0
50319                     lib/libQt5WebEngineCore.so.5.12     92408776.0
122433                                         bin/pandoc     76341040.0
25661                                   lib/libavformat.a     47442464.0
33070   site-packages/compliance_checker/tests/data/ma...     42981152.0
40321                                      lib/libgdal.so     35682192.0
40323                               lib/libgdal.so.28.0.1     35682192.0
40322                                   lib/libgdal.so.28     35682192.0
80393                                lib/libclang.so.11.1     35233816.0
80392                                     lib/libclang.so     35233816.0

Caching some of the build with #23

@ocefpaf

abkfenris commented 3 years ago

Removing tensorflow only slims things down another half gig

➜ docker images
REPOSITORY             TAG       IMAGE ID       CREATED          SIZE
ohw-no-py-tensorflow   latest    200d425d469f   52 seconds ago   5.3GB
ohw-cache-apt          latest    c646a0031f14   9 hours ago      5.81GB
ohw-cache              latest    274e85773a32   9 hours ago      5.81GB
ohw                    latest    d2014651c42b   10 hours ago     8.27GB

Archive.zip

ocefpaf commented 3 years ago

causing the build to explode.

What do you mean by exploding? We are not able to upload that?

PS: let's remove tensorflow!

abkfenris commented 3 years ago

I meant size in this case, but also didn't have permissions for uploading to Docker Hub.

abkfenris commented 3 years ago

I'll remove tensorflow in #23

abkfenris commented 1 year ago

no_link is no longer in the dataframe, so it should now be

import json
from pathlib import Path
import pandas as pd

pkg_files = Path("/opt/conda/conda-meta/").glob("*.json")

paths = []

for pkg_file in pkg_files:
    with pkg_file.open() as f:
        pkg = json.load(f)
        paths += pkg["paths_data"]["paths"]

df = pd.DataFrame(paths)
df = df.drop(
    [
        "path_type",
        "sha256",
        "sha256_in_prefix",
        # "no_link",
        "file_mode",
        "prefix_placeholder",
    ],
    axis=1,
)
df = df.dropna()
df = df.sort_values("size_in_bytes", ascending=False)
df

It's also useful to include mamba in the environment for mamba repoquery to get dependencies.