microsoft / graphrag

A modular graph-based Retrieval-Augmented Generation (RAG) system
https://microsoft.github.io/graphrag/
MIT License
19.35k stars 1.91k forks source link

Improve CLI speed with lazy imports #1319

Closed jgbradley1 closed 6 days ago

jgbradley1 commented 4 weeks ago

Description

This is an optional PR to consider. Because of too many imports getting executed in various __init__.py files throughout the library, the CLI is slow to start up. The initialization of so many imports also makes the initial import of graphrag slow as well.

Proposed Changes

Following some tips from this article, I profiled the startup time for the entire library with a focus on improving the CLI startup time using the following commands:

python -X importtime -c 'from graphrag.cli.main import *' 2> graphrag-imports.log
tuna graphrag-imports.log

The following screenshot shows the startup time for the CLI code when the package is first installed. graphrag-initial-startup-time

After the library has been loaded once (python generates bytecode to speed up future imports), we can achieve an even faster load time of 0.3 seconds.

graphrag-cached-startup-time

Tab completion with the CLI is also now very responsive.

Checklist

Additional Notes

This PR closes https://github.com/microsoft/graphrag/pull/1299 as well.

A small patch for Typer path completion was added to the CLI to improve the CLI experience.