sgkit-dev / sgkit

Scalable genetics toolkit
https://sgkit-dev.github.io/sgkit
Apache License 2.0
235 stars 32 forks source link

`import sgkit` takes ~1.5s #931

Open benjeffery opened 2 years ago

benjeffery commented 2 years ago

For tools using the CLI this amount of delay feels excessive. Around 1s of this time is performing imports. Here's the import flame graph: (0.1s on xarray.tutorial?!?)

sgkit-import-profile

I assume some of these could be imported only when they are needed - although given that many of the imports are referenced in typing specifications that might not be possible.

I'm not sure yet what the remaining 0.5s is - a cProfile callgraph is pretty useless on import but I think there is a way around that by profiling the individual sgkit files.

tomwhite commented 2 years ago

I agree that deferring some of the imports could help.

I think numba compilation is still a significant part of this, see #363, although we now have numba caching on so it's faster the second time. Not sure if we could defer this until it's needed too.

benjeffery commented 2 years ago

Thanks @tomwhite I'll check the numba compilation. I also wonder if PEP484 forward references might help with delaying imports: https://legacy.python.org/dev/peps/pep-0484/#forward-references

tomwhite commented 2 years ago

I also wonder if PEP484 forward references might help with delaying imports: https://legacy.python.org/dev/peps/pep-0484/#forward-references

It would be great if we could take advantage of forward references.

Do you think xarray itself is doing unnecessary imports - like the tutorial?

benjeffery commented 2 years ago

Do you think xarray itself is doing unnecessary imports - like the tutorial?

Yes, planning on checking and raising a PR/issue upstream, unless it looks like a rabbit-hole!

benjeffery commented 2 years ago

Also see https://github.com/pydata/xarray/issues/6726, seems they are aware of the issue.

jeromekelleher commented 2 years ago

If we're importing some things just for typing purposes, then I'd be +1 for making the typing less strict.

tomwhite commented 2 years ago

Pandas is not used very much in the codebase, so it might be possible to import it lazily.

Similarly, the distance API is pretty niche so making that lazier would be good too.