Open TheAssassin opened 6 years ago
You are right, the code is not optimized for speed in any way. What is interesting is that you seem to experience major slowdowns while I didn't notice these. But anyway, you are right this could be improved. As could the overall performance of linuxdeployqt
, since we are processing the same files over and over again. I think this is where the major performance bottleneck is. Thing is, I don't have the time to optimize this, since the end result is working at the moment, I can live with it.
If someone wants to really improve performance, I'd suggest to open a new branch and essentially do a major refactoring/rewrite, ensuring that each file that is deployed into the AppDir is only operated on once. But please don't attempt to do this in the main
branch, which is following the macdeployqt
program logic.
As for the regex, I think I found it by trial and error.
I could imagine a system like routing algorithms use them. Basically, you maintain two lists of "nodes" in a map. One contains the nodes that need to be regarded, the other one the which have been "visited" already (and hence don't need to be worked on any more). This is pretty efficient in my experience, and I could imagine implementing such a system here, too. However, we really need to refactor the code for this. Maybe I can work on something in one the the next sprints?
As for the regex, I think I found it by trial and error.
What regex are you talking about?
Regex: Must have had the wrong ticket in mind; nevermind.
Hi guys. This spent ~15 minuts for me for one application(please see #248). But my project contains more than one application... So, common time will be increased up to 1-2 hour, as minimum.. It is very very bad.. ((
Is macdeployqt soo slow too?
@denis-shienkov the tools can't really be compared. All I can tell is macdeployqt doesn't use this copyright copying stuff. Also, it's a Debian/Ubuntu specific feature.
))))))
oops, double shot )))
Many thanks, this new '-no-copy-copyright-files' option helped, it decreases time from ~15 minutes up to ~15 seconds (in my case).
Wow. I wonder why the impact of this can go from virtually non-existent to huuuuge. Are you running your builds on a system that has been set up long time ago, and has seen many apt-get
operations? (I am trying to find out why the impact is so large on your system but virtually negligible on Travis CI.)
CC @azubieta. I know you maintain your own patched linuxdeployqt because of these issues. Please verify whether you can use the upstream linuxdeployqt with this flag instead.
@probonopd I already told you it's dpkg-query
being super slow...
...and be sure to fulfill all requirements of licenses such as the GPL...
I already told you it's dpkg-query being super slow...
...on some systems but apparently not on others (=the one I happen to use)
The one you use is a live ISO without any customizations, no PPAs, and running from RAM.
We need to fix the architectural issues first in order to fix issues like these. According to you, linuxdeployqt touches the same file more than once, etc...
Correct
While reviewing linuxdeployqt, I could confirm that
copyCopyrightFile()
is one of the performance bottlenecks (takes 4-5 seconds, measuring the time of a few calls while watching linuxdeployqt's output with-verbose=3
).Now, to understand my idea, I'd like to quickly summarize how
copyCopyrightFile()
works. The function is called on every file copying process, so it should be as efficient as possible, given the amount of files that need to be copied into an AppDir.The function first performs some, let's say, "platform checks", i.e., whether
dpkg
anddpkg-query
are available on the system. These checks are performed on every call. I am not sure how much time those calls consume, but I'd say we can safely assume that if they're unavailable on the first call, this wouldn't change during the rest of the runtime, and vice versa. Therefore, I'd only perform these checks on the first call, and would then skip them (pretty much like "lazy initialization", I'd add a global static variable to keep this state).Next, the app queries the package that contains the file we're looking for a copyright file for, and then lists that package's contents (a list of paths). It then filters this lists with a list of given patterns (which I don't know about how it was compiled, @probonopd please add a comment about it, as far as I can see, it's been either you looking into a bunch of packets and where they stored this kind of data, or there's a Debian maintainer standard or document somewhere on the internet specifying those locations). These files are then copied into the AppDir.
I didn't get to profiling this with a sufficient granularity to see which of these calls is causing the performance bottleneck on my computer, but I can definitely blame
copyCopyrightFile()
. Since RAM is cheap nowadays, I want to suggest a method that optimizes subsequent calls tocopyCopyrightFile()
when files have been packaged in the same packages. I would like to save all paths of packages whose copyright files have been copied already in a suitable data structure (i.e., the files returned by the second query todpkg
). Whenever a file is contained in this data structure already, we can safely assume that its copyright files have been copied already.Of course, that will increase the memory footprint a bit, but I could imagine that this saves quite some amount of calls to
dpkg(-query)
. For the beginning, a simple sorted list combined with a binary search should do, allowing for quick lookups. We could further improve this by sorting the data e.g., into a tree, simulating the file system hierarchy, although I could imagine that the files we copy are in similar directories, hence requiring a binary search in the leafs, too, so I don't think that such a tree would have any positive impact on the lookup durations.If you want some numbers comparing before and after results, I can start with some benchmarking, to e able to see whether there's any impact on the overall performance.
And no, I own an SSD, it's probably not an IO bottleneck on my end causing
dpkg(-query)
to be this slow.