which highlights the lengthy execution time of running language-extensions/python/build/windows/restore-packages.cmd in pipelines (and also locally). Pipelines have this step taking ~20 minutes to complete. Essentially, expanding the Boost archive takes a long time on Windows due to the count of files.
Closes #54
which highlights that Numpy v2.0+ isn't supported in current Boost version, causing pipeline build failure for Python extension.
We recommend downloading boost_1_79_0.7z and using 7-Zip to decompress it. We no longer recommend .zip files for Boost because they are twice as large as the equivalent .7z files. We don't recommend using Windows' built-in decompression as it can be painfully slow for large archives. Ref
Perf improvements:
Before: pipelines average of ~20minute execution time.
Now (Improved to 5 minutes)
What is this change
Updates the URL used to fetch Boost 1.79.0 from sourceforge to archives.boost.io
Downloads a 7z (7zip) archive instead of zip file because performance increase was drastic when testing locally with a 7z file on Windows. (And Boost docs recommend that we use 7z file)
Output timestamps before extracting boost (which took 20 minutes) to point out that this was the culprit of slow execution.
Added timestamps for building boost so that we can identify duration in pipeline.
Hardcodes Numpy and Pandas dependencies for Python to match the hardcoded versions defined in Linux restore-packages.sh script. Support for Numpy 2.0+ in Boosts hasn't made it to official release yet, only in dev branches. more details in #54
Numpy: 1.22.3
Pandas: 1.4.2
7zip command reference
%ARCHIVE_TOOL_PATH% x -y -o"%PACKAGES_ROOT%" "boost_%BOOST_VERSION_IN_UNDERSCORE%.7z"
%ARCHIVE_TOOL_PATH% -> full path to 7zip exe. on pipeline it is a specific path. Local devs will need to update accordingly.
x This is a 7-Zip command that tells 7Zip to extract files from an archive with their full path
-y this option automatically answers yes to any prompts (such as overwrite confirmations).
-o option to specify output directory where the files will be extracted
"path" This is the path to the .7z archive that will be extracted.We recommend downloading boost_1_82_0.7z and using 7-Zip to decompress it. We no longer recommend .zip files for Boost because they are twice as large as the equivalent .7z files. We don't recommend using Windows' built-in decompression as it can be painfully slow for large archives.
Why this change?
Closes #52
Closes #54
Background on zip file slowness
Anecdotal data points for :
Boost Recommendation not to use ZIP files:
Perf improvements:
Before: pipelines average of ~20minute execution time.![image](https://github.com/microsoft/sql-server-language-extensions/assets/6414189/988292eb-816b-4756-bdad-0b275ba184db)
Now (Improved to 5 minutes)![image](https://github.com/microsoft/sql-server-language-extensions/assets/6414189/2a0f032b-88e9-415c-8b3f-c51bedf1a049)
What is this change
restore-packages.sh
script. Support for Numpy 2.0+ in Boosts hasn't made it to official release yet, only in dev branches. more details in #541.22.3
1.4.2
7zip command reference
%ARCHIVE_TOOL_PATH%
-> full path to 7zip exe. on pipeline it is a specific path. Local devs will need to update accordingly.x
This is a 7-Zip command that tells 7Zip to extract files from an archive with their full path-y
this option automatically answers yes to any prompts (such as overwrite confirmations).-o
option to specify output directory where the files will be extracted"path"
This is the path to the .7z archive that will be extracted.We recommend downloading boost_1_82_0.7z and using 7-Zip to decompress it. We no longer recommend .zip files for Boost because they are twice as large as the equivalent .7z files. We don't recommend using Windows' built-in decompression as it can be painfully slow for large archives.How was this tested?