Closed Luni-4 closed 4 years ago
Thank @Luni-4 for report. I think a reason why performance decreased.
About 1, I checked its performance. I've compared the library and external command
library execution (pure python)
$ time py7zr x 5.12.6-0-201911111120qtbase-Windows-Windows_10-Mingw73-Windows-Windows_10-X86_64.7z
real 0m35.086s
user 0m29.616s
sys 0m6.972s
7zip command (C++)
$ time 7zr x 5.12.6-0-201911111120qtbase-Windows-Windows_10-Mingw73-Windows-Windows_10-X86_64.7z
7-Zip (a) [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=ja_JP.UTF-8,Utf16=on,HugeFiles=on,64 bits,4 CPUs Intel(R) Core(TM) i3-6100U CPU @ 2.30GHz (406E3),ASM,AES-NI)
Scanning the drive for archives:
1 file, 214532582 bytes (205 MiB)
Extracting archive: 5.12.6-0-201911111120qtbase-Windows-Windows_10-Mingw73-Windows-Windows_10-X86_64.7z
--
Path = 5.12.6-0-201911111120qtbase-Windows-Windows_10-Mingw73-Windows-Windows_10-X86_64.7z
Type = 7z
Physical Size = 214532582
Headers Size = 47567
Method = LZMA2:24 BCJ
Solid = +
Blocks = 2
Everything is Ok
Folders: 279
Files: 3281
Size: 1381335977
Compressed: 214532582
real 0m30.473s
user 0m24.312s
sys 0m5.445s
Penalty is 5 sec / 30sec, 15%.
You can add '-E <path to 7zip command>' explicitly to use external command.
Thanks for your explanation @miurahr, but I can't understand why that doesn't happen with the win64_msvc2017_64
arch. I mean, if the extractors are the cause of the problem, they should affect the win64_msvc2017_64
arch too, or am I misunderstanding something?
Anyway, this is the performance of the win64_msvc2017_64
arch, as you can see it took only 50s to be completed.
Now you can see detailed log at https://dev.azure.com/miurahr/892bc5d7-3681-4359-bd29-fe00ca91ef10/_apis/build/builds/1581/logs/364 where a branch try to improve performance with threading for downloading and with multi-process for extraction, and adding detailed logs.
Here is a part of logs
2020-02-10T01:33:02.7074789Z ##[section]Starting: Run Aqt (No Base URL Set)
<snip>
2020-02-10T01:33:03.0182117Z [command]C:\hostedtoolcache\windows\Python\3.7.6\x64\python.exe D:\a\1\s\bin\aqt install --outputdir D:\a\1\b/Qt 5.13.2 windows desktop win64_mingw7
2020-02-10T01:33:05.3607477Z 2020-02-10 01:33:05,352 - aqt - INFO - Downloading https://download.qt.io/online/qtsdkrepository/windows_x86/desktop/qt5_5132/qt.qt5.5132.win64_mingw73/5.13.2-0-201910281254qtbase-Windows-Windows_10-Mingw73-Windows-Windows_10-X86_64.7z...
<snip>
2020-02-10T01:33:16.8026820Z 2020-02-10 01:33:16,457 - aqt - INFO - Extracting qtbase-Windows-Windows_10-Mingw73-Windows-Windows_10-X86_64.7z..
<snip>
2020-02-10T01:33:19.5726840Z 2020-02-10 01:33:19,494 - aqt - INFO - Downloads are Completed.
<snip>
2020-02-10T01:57:47.1813899Z 2020-02-10 01:57:47,180 - aqt - INFO - Extraction qtbase-Windows-Windows_10-Mingw73-Windows-Windows_10-X86_64.7z done
2020-02-10T01:57:47.1832914Z 2020-02-10 01:57:47,180 - aqt - INFO - Finished installatio
During extracting qtbase package, 12 packages are completed to download and start extracting, and 17 packages have been extracted.
It seems that qtbase extraction in my note-pc takes just 30sec but it consumes 24 min on Azure Pipelines. Other processes mostly work fine but extraction of qtbase(200MB) and qt3d(100MB) takes loooong time. Is it a lack of CPU time on cloud service?
Thanks for your tests @miurahr!
Hmm, that's pretty strange indeed. Could you please make a new release on test.pypi
of #87 such that I could test your changes using the GitHub Actions? Let's see if the problem persists there also.
In the meantime, I've downloaded and extracted qtbase
and qt3d
using the Github Actions running curl -LO $LINK
as first command and then 7z x $ZIP
. They took respectively 30s and 17s, as you can see from here
You can use '[-E | --external <7zip command>] ' option for aptinstall. If it works for you, it is a problem on extraction library or related code. When specify the option, aqtinstall launch '7z x' command by 'subprocess.run()' python feature.
Unfortunately there is a bug with '--external' option. Now I'm working for it.
A root cause is 'dead lock'. When extracting mingw packages, all python process going to 'Sleep' state. I found a description in python manual as follows:
https://docs.python.org/3/library/concurrent.futures.html#processpoolexecutor
Calling Executor or Future methods from a callable submitted to a ProcessPoolExecutor will result in deadlock.
Because py7zr extraction library uses 'concurrent.futures' to accelerate extraction in multi-thread when archive file is compressed with supporting concurrency, that is happen on large archives, aqt run extractor in the context of concurrent.futures that call py7zr which call futures.
To solve a problem, we need to rethink a concurrency design.
Great! it takes only 2 minutes now! Thanks a lot @miurahr! :)
Out of curiosity, would it be possible to reduce this time to 31s as the msvc
build?
Out of curiosity, would it be possible to reduce this time to 31s as the msvc build?
Unfortunately msvc downloads are total 90MB but mingw downloads are total 209MB in Qt 5.15.0, and >300MB in Qt 5.12.
This means you need twice or 3rd times longer for mingw download and extraction than msvc packages.
Issue here solved.
Out of curiosity, would it be possible to reduce this time to 31s as the msvc build?
Unfortunately msvc downloads are total 90MB but mingw downloads are total 209MB in Qt 5.15.0, and >300MB in Qt 5.12.
This means you need twice or 3rd times longer for mingw download and extraction than msvc packages.
Oh, I see. Thanks for your explanation! :)
The
win64_mingw73
arch takes a lot of time to be installed. As you can see from here, it takes more than 17 minutes. Before it took only 1 minute, as you can see from here.If I use the
win64_msvc2017_64
instead, the problem doesn't occur.Thanks in advance for your help! :)