sandialabs / sceptre-phenix

phenix is an orchestration tool and GUI for Sandia's minimega platform
https://sandialabs.github.io/sceptre-docs/
GNU General Public License v3.0
17 stars 23 forks source link

Calls to /disks slow #123

Open jacdavi opened 1 year ago

jacdavi commented 1 year ago

We've been noticing calls to /disks can be pretty slow (20+ seconds).

It seems that the speed is mostly due to the large number of files/directories. In one example we have 1,520 files 1,727 directories. Of note, 1,687 of the directories have miniccc_responses in their path.

The main code that handles this is files.getAllFiles. There are a number of solutions I've tried that fix the issue, but I'm not sure what's the best solution.

  1. Limit the recursion depth to the base directory and one sub directory
  2. Only search subdirectories for the experiment, not other subdirectories
    • The code is set up to ignore directories that are for other experiments. However, if phenix loses track of an experiment, but its files still exist, they will get included. That seems to be happening in this example
  3. Explicitly exclude miniccc_responses directories from search

I don't love the idea of hardcoding an exception for miniccc_responses, but that's the best option I've come up with that nearly guarantees a disk isn't missed. Though I'm not sure how users typically arrange their disks. All of ours are in the base directory.

jacdavi commented 1 year ago

@activeshadow I can make a PR for one of the above solutions (or something else), but would be interested in your thoughts first. Thanks!

activeshadow commented 1 year ago

@jacdavi since phenix is pretty much only wired to work with minimega at this point, I don't mind hard coding it to skip over miniccc_responses. I think the typical use case is to have all images in the base directory, but the code in question to check for image files deeper into subdirectories was added by @eric-c-wood, who is a prominent user of phenix, so I would suggest leaving that in place unless he chimes in otherwise.

Long story short, let's just go with hard coding the code to skip miniccc_responses directories for now.

jacdavi commented 1 year ago

Actually maybe the miniccc_responses isn't a general enough solution. I know Arthur had this problem before, and I recall he had a ~5000 directory container file system. So the solution wouldn't work in that case.

eric-c-wood commented 1 year ago

@jacdavi @activeshadow Wow, 20+ seconds is way too slow especially when feeding a UI. For our use cases, we really only have a need to enumerate disk images in the minimega files directory and disk images that exist in {minimega files directory}/{experiment directory}/"files". In addition, there is a need to enumerate disk images defined in a topology that may exists outside the minimega files directory path but that appears to be handled by the getTopologyFiles function.

Would a solution that first enumerates disk images in the minimega files directory combined with an enumeration of disk images in {minimega files directory}/{experiment directory}/"files" solve the 20+ second response time for most use cases? Would that be too restrictive for other use cases?