microsoft / sql-server-language-extensions

language-extensions-sqlserver
Other
106 stars 42 forks source link

Improve Python `restore-packages.cmd` execution time in pipeline + Fix Build break due to numpy 2.0+ #53

Open seantleonard opened 1 week ago

seantleonard commented 1 week ago

Why this change?

  1. Closes #52

    • which highlights the lengthy execution time of running language-extensions/python/build/windows/restore-packages.cmd in pipelines (and also locally). Pipelines have this step taking ~20 minutes to complete. Essentially, expanding the Boost archive takes a long time on Windows due to the count of files.
  2. Closes #54

    • which highlights that Numpy v2.0+ isn't supported in current Boost version, causing pipeline build failure for Python extension.

Background on zip file slowness

Anecdotal data points for :

Boost Recommendation not to use ZIP files:

We recommend downloading boost_1_79_0.7z and using 7-Zip to decompress it. We no longer recommend .zip files for Boost because they are twice as large as the equivalent .7z files. We don't recommend using Windows' built-in decompression as it can be painfully slow for large archives. Ref

Perf improvements:

What is this change

  1. Updates the URL used to fetch Boost 1.79.0 from sourceforge to archives.boost.io
  2. Downloads a 7z (7zip) archive instead of zip file because performance increase was drastic when testing locally with a 7z file on Windows. (And Boost docs recommend that we use 7z file)
  3. Output timestamps before extracting boost (which took 20 minutes) to point out that this was the culprit of slow execution.
  4. Added timestamps for building boost so that we can identify duration in pipeline.
  5. Hardcodes Numpy and Pandas dependencies for Python to match the hardcoded versions defined in Linux restore-packages.sh script. Support for Numpy 2.0+ in Boosts hasn't made it to official release yet, only in dev branches. more details in #54
    • Numpy: 1.22.3
    • Pandas: 1.4.2

7zip command reference

%ARCHIVE_TOOL_PATH% x -y -o"%PACKAGES_ROOT%" "boost_%BOOST_VERSION_IN_UNDERSCORE%.7z"

How was this tested?