merenlab / anvio

An analysis and visualization platform for 'omics data
http://merenlab.org/software/anvio
GNU General Public License v3.0
423 stars 144 forks source link

[FEATURE REQUEST] Implement lazy loading of imports for faster help output (import anvio: slow due to due I/O usage) #1924

Open nick-youngblut opened 2 years ago

nick-youngblut commented 2 years ago

Short description of the problem

Just listing the script help for any jobs is taking 2-3 minutes on my machine. For example, I timed a couple of -h jobs (3x replicates):

anvi-display-pan -h  8.00s user 10.39s system 11% cpu 2:43.39 total
anvi-display-pan -h  7.10s user 7.40s system 8% cpu 2:43.55 total
anvi-display-pan -h  7.53s user 8.39s system 9% cpu 2:39.69 total

anvi-pan-genome -h  5.60s user 10.26s system 10% cpu 2:25.04 total
anvi-pan-genome -h  4.50s user 6.89s system 7% cpu 2:28.83 total
anvi-pan-genome -h  5.60s user 10.07s system 11% cpu 2:14.58 total

The jobs are on disk-sleep for most of the time, which seems to be due to high I/O load just to list the help docs.

As a counter example, diamond help with the same conda env on the same NFS:

diamond help  0.00s user 0.01s system 16% cpu 0.039 total
diamond help  0.00s user 0.01s system 10% cpu 0.057 total
diamond help  0.01s user 0.00s system 17% cpu 0.036 total

I profiled the code, and the bottlenecks seem to be import anvio and import anvio.panops.

I've been looking at the anvio code, but I can't seem to find why import anvio and import anvio.panops require so much I/O.

It would be great if the anvio scripts use more lazy loading, or if the I/O load during import could be reduced.

anvi'o version

$ anvi-self-test --version
Anvi'o .......................................: hope (v7.1)

Profile database .............................: 38
Contigs database .............................: 20
Pan database .................................: 15
Genome data storage ..........................: 7
Auxiliary data storage .......................: 2
Structure database ...........................: 2
Metabolic modules database ...................: 2
tRNA-seq database ............................: 2

System info

Ubuntu 18.04.6, 80 threads, 1 Tb mem

nick-youngblut commented 2 years ago

According to a cProfile run on import anvio, there are 3 major bottlenecks:

Early on:

       65    0.003    0.000    0.008    0.000 <frozen importlib._bootstrap_external>:1067(_path_hooks)
      932    0.003    0.000    0.011    0.000 <frozen importlib._bootstrap_external>:1080(_path_importer_cache)
      577    0.006    0.000    0.247    0.000 <frozen importlib._bootstrap_external>:1117(_get_spec)
      577    0.002    0.000    0.249    0.000 <frozen importlib._bootstrap_external>:1149(find_spec)
       66    0.001    0.000    0.002    0.000 <frozen importlib._bootstrap_external>:1196(__init__)
      528    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:1202(<genexpr>)
        8    0.000    0.000    0.001    0.000 <frozen importlib._bootstrap_external>:1216(find_loader)
      556    0.004    0.000    0.013    0.000 <frozen importlib._bootstrap_external>:1228(_get_spec)
      890    0.023    0.000    0.240    0.000 <frozen importlib._bootstrap_external>:1233(find_spec)
       66    0.001    0.000    0.015    0.000 <frozen importlib._bootstrap_external>:1281(_fill_cache)
       66    0.001    0.000    0.004    0.000 <frozen importlib._bootstrap_external>:1322(path_hook_for_FileFinder)
      938    0.009    0.000    0.023    0.000 <frozen importlib._bootstrap_external>:263(cache_from_source)
      556    0.004    0.000    0.017    0.000 <frozen importlib._bootstrap_external>:361(_get_cached)
      469    0.002    0.000    0.002    0.000 <frozen importlib._bootstrap_external>:393(_check_name_wrapper)
      890    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:41(_relax_case)
        8    0.000    0.000    0.001    0.000 <frozen importlib._bootstrap_external>:413(_find_module_shim)
      469    0.024    0.000    0.029    0.000 <frozen importlib._bootstrap_external>:430(_validate_bytecode_header)
      469    0.005    0.000    0.196    0.000 <frozen importlib._bootstrap_external>:485(_compile_bytecode)
      938    0.003    0.000    0.004    0.000 <frozen importlib._bootstrap_external>:52(_r_long)
      556    0.006    0.000    0.008    0.000 <frozen importlib._bootstrap_external>:524(spec_from_file_location)
     5026    0.012    0.000    0.029    0.000 <frozen importlib._bootstrap_external>:57(_path_join)
     5026    0.009    0.000    0.014    0.000 <frozen importlib._bootstrap_external>:59(<listcomp>)
      938    0.002    0.000    0.003    0.000 <frozen importlib._bootstrap_external>:63(_path_split)
      469    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:669(create_module)
    469/1    0.005    0.000   67.098   67.098 <frozen importlib._bootstrap_external>:672(exec_module)
      469    0.019    0.000   28.857    0.062 <frozen importlib._bootstrap_external>:743(get_code)
     2578    0.003    0.000    0.169    0.000 <frozen importlib._bootstrap_external>:75(_path_stat)
      469    0.001    0.000    0.001    0.000 <frozen importlib._bootstrap_external>:800(__init__)
      469    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:825(get_filename)
      469   28.574    0.061   28.589    0.061 <frozen importlib._bootstrap_external>:830(get_data)
      469    0.001    0.000    0.008    0.000 <frozen importlib._bootstrap_external>:840(path_stats)
     1219    0.003    0.000    0.121    0.000 <frozen importlib._bootstrap_external>:85(_path_is_mode_type)
       87    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:908(__init__)
    87/70    0.001    0.000   37.407    0.534 <frozen importlib._bootstrap_external>:919(create_module)
    87/83    0.000    0.000    0.142    0.002 <frozen importlib._bootstrap_external>:927(exec_module)
     1089    0.002    0.000    0.120    0.000 <frozen importlib._bootstrap_external>:94(_path_isfile)
      130    0.000    0.000    0.003    0.000 <frozen importlib._bootstrap_external>:99(_path_isdir)

Close to the end:

     8175    0.003    0.000    0.003    0.000 {built-in method builtins.min}
      469    0.000    0.000    0.388    0.001 {built-in method builtins.next}
     6344    0.001    0.000    0.001    0.000 {built-in method builtins.ord}
      286    0.001    0.000    0.001    0.000 {built-in method builtins.repr}
        9    0.000    0.000    0.000    0.000 {built-in method builtins.round}
     5218    0.004    0.000    0.005    0.000 {built-in method builtins.setattr}
      234    0.003    0.000    0.039    0.000 {built-in method builtins.sorted}
       30    0.000    0.000    0.000    0.000 {built-in method builtins.sum}
      769    0.000    0.000    0.000    0.000 {built-in method builtins.vars}
        1    0.000    0.000    0.000    0.000 {built-in method fcntl.ioctl}
      959    0.002    0.000    0.002    0.000 {built-in method from_bytes}
        4    0.000    0.000    0.000    0.000 {built-in method fromkeys}
        1    0.000    0.000    0.000    0.000 {built-in method fromtimestamp}
      609   18.073    0.030   18.073    0.030 {built-in method io.open}
        1    0.000    0.000    0.000    0.000 {built-in method maketrans}
      469    0.188    0.000    0.188    0.000 {built-in method marshal.loads}
        1    0.000    0.000    0.000    0.000 {built-in method math.exp}
        2    0.000    0.000    0.000    0.000 {built-in method math.log}
        1    0.000    0.000    0.000    0.000 {built-in method math.sqrt}
       30    0.000    0.000    0.000    0.000 {built-in method numpy.arange}
      191    0.001    0.000    0.001    0.000 {built-in method numpy.array}

Even closer to the end:

      398    0.000    0.000    0.000    0.000 {method '__contains__' of 'frozenset' objects}
        1    0.000    0.000    0.000    0.000 {method '__enter__' of '_thread.lock' objects}
        1    0.000    0.000    0.000    0.000 {method '__exit__' of '_thread.lock' objects}
      873    0.001    0.000    0.002    0.000 {method '__reduce_ex__' of 'object' objects}
     1545    0.000    0.000    0.000    0.000 {method '__subclasses__' of 'type' objects}
     1534    0.000    0.000    0.000    0.000 {method '__subclasshook__' of 'object' objects}
       16    0.000    0.000    0.000    0.000 {method 'acquire' of '_thread.RLock' objects}
        2    0.000    0.000    0.000    0.000 {method 'acquire' of '_thread.lock' objects}
     4234    0.001    0.000    0.001    0.000 {method 'add' of 'set' objects}
       30    0.000    0.000    0.000    0.000 {method 'any' of 'numpy.generic' objects}
    30040    0.005    0.000    0.005    0.000 {method 'append' of 'list' objects}
       30    0.000    0.000    0.000    0.000 {method 'astype' of 'numpy.ndarray' objects}
       82    0.000    0.000    0.000    0.000 {method 'bit_length' of 'int' objects}
        9    0.000    0.000    0.000    0.000 {method 'cast' of 'memoryview' objects}
       29    0.000    0.000    0.000    0.000 {method 'clear' of 'dict' objects}
      595   17.977    0.030   17.977    0.030 {method 'close' of '_io.BufferedReader' objects}
        2    0.000    0.000    0.000    0.000 {method 'copy' of 'collections.OrderedDict' objects}
       11    0.000    0.000    0.000    0.000 {method 'copy' of 'dict' objects}
        3    0.000    0.000    0.000    0.000 {method 'copy' of 'mappingproxy' objects}
       60    0.000    0.000    0.000    0.000 {method 'copy' of 'numpy.ndarray' objects}
        4    0.000    0.000    0.000    0.000 {method 'count' of 'list' objects}
meren commented 2 years ago

Yes, our import hierarchy is horrible, and getting worse and worse. While I know we need to do something about this, I'm not sure where to start TBH. Any suggestions are most welcome. Where are these exactly occurring for instance, is there a way to find more about them?

(...)
      469   28.574    0.061   28.589    0.061 <frozen importlib._bootstrap_external>:830(get_data)
(...)
      609   18.073    0.030   18.073    0.030 {built-in method io.open}
(...)
      595   17.977    0.030   17.977    0.030 {method 'close' of '_io.BufferedReader' objects}
(...)
nick-youngblut commented 2 years ago

I don't have a lot of experience with profiling code, but I believe that you could iteratively look through the code. The I/O load has substantially dropped on our NFS today, so now it only takes ~10-20 secs instead of 2-3 minutes. Still, it would be great to at least get the script help without requiring import anvio ... but you have a special argparser wrapped up in the anvio package.

meren commented 2 years ago

Holy crap. At UChicago NSF, which is notoriously slow as it servers tens of thousands of cores, import anvio takes a few seconds. I wonder if this is about v7.1. Regardless, someone with more experience working with profiling of imports should certainly take a look into this for sure :/ Thank you, Nick.

nick-youngblut commented 2 years ago

Holy crap. At UChicago NSF, which is notoriously slow as it servers tens of thousands of cores

Yeah I/O loads on our system can get really high when 1000's of cluster jobs are reading/writing to the same file system