mixxxdj / mixxx

Mixxx is Free DJ software that gives you everything you need to perform live mixes.
http://mixxx.org
Other
4.51k stars 1.28k forks source link

(win) very long start when network shares are unavailable #8201

Closed mixxxbot closed 2 years ago

mixxxbot commented 2 years ago

Reported by: sblaisot Date: 2015-08-20T17:00:05Z Status: Fix Released Importance: Undecided Launchpad Issue: lp1487138


Under Windows, using 1.12-beta1, mixxx startup takes a very long time (>30 seconds) when you have network shares mounted (drive letter assigned) that are unavailable at the time you launch mixxx (which happens if you put your laptop on sleep at home with a network share mounted and start it when you arrive at a gig for example)

Step to reproduce :

  1. assign a drive letter to a network share
  2. Change the IP address of your network card to be in another network without routing capabilities (simply shutting down the interface won't work, you must have some network but not the possibility to access network share)
  3. launch mixxx
mixxxbot commented 2 years ago

Commented by: sblaisot Date: 2015-08-20T17:00:59Z


It's like Mixxx somehow try to scan all drives when launching.

mixxxbot commented 2 years ago

Commented by: sblaisot Date: 2015-08-20T17:04:31Z


Also tested on windows 10, Mixxx takes a little more time to launch that normal when mounted network shares are unavailable, but far less than under windows seven.

Can anyone test under linux/macos with an NFS share mounted that is no more accessible (using soft mount of course) when Mixxx launch ?

mixxxbot commented 2 years ago

Commented by: sblaisot Date: 2015-08-22T10:25:46Z


I just tested on windows 10 and couldn't reproduce. It seems that this problem is only shown on windows XP to seven.

mixxxbot commented 2 years ago

Commented by: sblaisot Date: 2015-08-23T17:58:49Z


ok, the time is taken here : https://github.com/mixxxdj/mixxx/blob/1.12/src/mixxx.cpp#L327

I managed to reproduce it with 52 seconds taken by this sole line.

adding some debug print before/after gives me in the log :
Debug [Main]: "19:52:23.803" Start progress:  32 
Debug [Main]: Available QtSQL drivers: ("QSQLITE") 
Debug [Main]: DB status: "C:/Users/Sébastien/AppData/Local/Mixxx/mixxxdb.sqlite" = true 
Debug [Main]: SchemaManager::upgradeToSchemaVersion already at version 24 
Debug [Main]: TrackDAO::initialize QThread(0x415cd0, name = "Main") "qt_sql_default_connection" 
Debug [Main]: CrateDAO::initialize() 
Debug [Main]: CueDAO::initialize QThread(0x415cd0, name = "Main") "qt_sql_default_connection" 
Debug [Main]: DirectoryDAO::initialize QThread(0x415cd0, name = "Main") "qt_sql_default_connection" 
Debug [Main]: LibraryHashDAO::initialize QThread(0x415cd0, name = "Main") "qt_sql_default_connection" 
Debug [Main]: Default quick links: ("D:/Documents/Sébastien/music/soirées/", "D:/Documents/Sébastien/music/", "C:/Users/Sébastien/Downloads/", "D:/Documents/Sébastien/Desktop/", "D:/Documents/Sébastien/Mes Documents/") 
Debug [Main]: Appending Quick Link:  "soirées" --- "D:/Documents/Sébastien/music/soirées/" 
Debug [Main]: Appending Quick Link:  "music" --- "D:/Documents/Sébastien/music/" 
Debug [Main]: Appending Quick Link:  "Downloads" --- "C:/Users/Sébastien/Downloads/" 
Debug [Main]: Appending Quick Link:  "Desktop" --- "D:/Documents/Sébastien/Desktop/" 
Debug [Main]: Appending Quick Link:  "Mes Documents" --- "D:/Documents/Sébastien/Mes Documents/" 
Debug [Main]: Committing transaction on "qt_sql_default_connection" result: true 
Debug [Main]: Traktor Library Location=[ "C:/Users/Sébastien/collection.nml" ] 
Debug [Main]: Checking for access to user's library directories: 
Debug [Main]: Checking for access to "D:\Documents\Sébastien\music\soirées" : true 
Debug [Main]: "19:53:15.791" Start progress:  33 

I have to dig a bit more inside this block but it seems that somewhere in it there is a drive scan that happens and waits for a system timeout.

mixxxbot commented 2 years ago

Commented by: sblaisot Date: 2015-08-23T20:48:47Z


the time is spent here : https://github.com/mixxxdj/mixxx/blob/1.12/src/library/library.cpp#L65

continuing to dig...

mixxxbot commented 2 years ago

Commented by: ywwg Date: 2015-08-23T21:02:06Z


The left-hand pane includes a "browse" item that enumerates all of the devices on the system. If there are network drives, it will scan them. Perhaps we should lazily populate that list and not enumerate the devices until that part of the tree is twirled down? And then it could populate them in the background, or show them one at a time, making it more clear to the user what's taking so long.

I considered suggesting that we skip network devices, but I could imagine a radio station setup that uses a NAS to store music, and we'd want to support that. So finding some way to do this work in the background seems like the best option.

mixxxbot commented 2 years ago

Commented by: ywwg Date: 2015-08-23T21:02:38Z


(by scan, I don't mean it will scan all the directories on the remote device, I believe it just sees that it exists. But something in that process is taking a long time)

mixxxbot commented 2 years ago

Commented by: sblaisot Date: 2015-08-23T21:20:45Z


I'm trying to find the line of code that takes a long time.

It seems that QDir::drives() is not reponsible.

It's probably drive.canonicalPath() of drive.filePath() that is taking extra delay here. So with a little change, I can find a way to avoid trying to read to the unavailable drive while populating the left-hand pane.

on the other hand, I also have to check what happens if you click in the left-hand pane on an unavailable drive to be sure we don't hang mixxx. I'll report here what I find.

mixxxbot commented 2 years ago

Commented by: sblaisot Date: 2015-08-23T21:21:09Z


sorry, I meant drive.canonicalPath() OR drive.filePath()

mixxxbot commented 2 years ago

Commented by: sblaisot Date: 2015-08-23T21:27:45Z


If I click in the left-hand pane on a drive that is not accessible, I sometimes managed to hang mixxx interface (waveform doesn't move anymore, vu-meter hang, nothing move), but the playback continue without any problem.

the interface becomes responsive again after some seconds. When you are in the middle of a mixx, this seconds can appear very long ;)
mixxxbot commented 2 years ago

Commented by: sblaisot Date: 2015-08-23T21:52:48Z


ok, finally got it.

the delay is taken here : https://github.com/mixxxdj/mixxx/blob/1.12/src/library/browse/browsefeature.cpp#L70

and the responsible is drive.canonicalPath()

on my system (win seven); the two functions return the same thing : X:/

only the first call to drive.canonicalPath() takes a long time (waiting for the system timeout). If I add a second call, this one is fast.

I'm considering changing drive.canonicalPath() for drive.filePath(). This, if a drive hang mixxx interface for a few seconds after a user clicked on it, he knew which drive if having access problems.

what do you think ?

mixxxbot commented 2 years ago

Commented by: sblaisot Date: 2015-08-23T22:06:13Z


I made a try.

replacing drive.canonicalPath() with drive.filePath() make mixxx launch much more rapidly. but on the other hand, this delays the first faulty drive acces to when expanding the "peripheral" item on the left-hand pane, which totally freezes mixxx interface for some loooong seconds. playback continue, but the interface is totally unresponsive. After system timeout, it comes back.

so I think it's better to have a long time at startup than during a mix...

the real bug is you have a system with a unaccessible drive. What about setting a 1 second timeout on drive.canonicalPath() and skipping the drive that is causing this timeout with an warning message in the log (something like "drive X:/ is not accessible") ? I mean not showing it in the peripheral list of the left-hand pane.

We should also have some timeout like this when browsing with the left-hand pane to avoid interface freeze.

mixxxbot commented 2 years ago

Commented by: sblaisot Date: 2015-08-23T23:16:01Z


ok, finally, no matter what time it gets on launch and if launching tries to access unavailable shares or not, you will have sometimes interface freezes when expanding the "drives" part of the left pane when network shares are unavailable.

So I made a PR to have a quicker launch, which should not have any side effect on the interface freeze when expanding drives, but will accelerate launching time.

https://github.com/mixxxdj/mixxx/pull/686

mixxxbot commented 2 years ago

Commented by: ywwg Date: 2015-08-24T01:15:26Z


The problem is that canonicalpath and filepath are not interchangeable. By forcibly using the filepath there could be new issues with symbolic links and ".." in the filename. Or, we have to allow for more time to do testing to discover if filepath is a suitable replacement. But the best solution would be to come up with a more robust fix that still uses canonicalpath() and does the work in a worker thread.

Since this is more of a bug with the user's system and not Mixxx, I think we should target master instead of 1.12 for working around this issue. But for 1.12 we should definitely print out an error in the log if a drive is broken (can we trap the error somehow?), and we should not try to access that drive again for that session if possible.

mixxxbot commented 2 years ago

Commented by: ywwg Date: 2015-08-24T01:16:17Z


If the user is seeing slow startup, we'll be able to look at the log and see that they have a broken network drive, and tell them to fix it.

mixxxbot commented 2 years ago

Commented by: sblaisot Date: 2015-08-24T05:37:00Z


Owen, there can not be any problem with .. and symlinks :

1 We are in an ifdef WINDOWS so there is no symlink 2 we only apply canonicalPath/filePath to foreach drive in QDir::drives(), so we only work on a list of drive letters

so we are in a special case here where filePath and canonicalPath are exactly the same.

a more robust fix should target master, but a quick fix for 1.12 is a good option I think.

We cannot trap and error, because there is no error, only a long time to access broken drive. The only way to trap the error is to add some sort of timeout. Even if we don't populate the "devices" rootItem with the broken drive, we rebuild the drives list on each item expansion in onLazyChildExpandation:300

mixxxbot commented 2 years ago

Commented by: sblaisot Date: 2015-08-24T05:53:48Z


and last but not least, in this portion of code, canonicalPath is only used for display purposes. Used only as the first parameter of treeItem constructor. So there is indeed absolutely no problem using something else.

Currently, the comment on https://github.com/mixxxdj/mixxx/blob/1.12/src/library/browse/browsefeature.cpp#L72 is inaccurate : canonicalPath returns C:/ and not C: so canonicalPath is exactly the same than filePath in this particular situation.

using filePath and removing trailing / is IMHO the best way to show the user a list of X: drive letters.

mixxxbot commented 2 years ago

Commented by: ywwg Date: 2015-08-24T14:58:11Z


Although the string is only for display, I believe the display string is used to determine what filename to load. But even so, the ifdef that applies this only to windows makes this safer. I just have a couple questions on the PR

mixxxbot commented 2 years ago

Commented by: sblaisot Date: 2015-08-24T17:04:31Z


Owen,

this display string is not used to determine what filename to load, only for left-hand pane item name as far as I understand.

copying https://github.com/mixxxdj/mixxx/blob/1.12/src/library/browse/browsefeature.cpp#L95 // Just a word about how the TreeItem objects are used for the BrowseFeature: // The constructor has 4 arguments: // 1. argument represents the folder name shown in the sidebar later on // 2. argument represents the folder path which MUST end with '/' // 3. argument is the library feature itself // 4. the parent TreeItem object

we are talking here about the 1st argument only.

mixxxbot commented 2 years ago

Commented by: ywwg Date: 2015-08-24T17:21:53Z


Ah I was thinking of the library table view, where the "location" item is used as the canonical location of the file and is not just a display value.

mixxxbot commented 2 years ago

Issue closed with status Fix Released.