hezarai / hezar

The all-in-one AI library for Persian, supporting a wide variety of tasks and modalities!
https://hezarai.github.io/hezar/
Apache License 2.0
844 stars 45 forks source link

`import hezar` is slow! #112

Closed arxyzan closed 11 months ago

arxyzan commented 11 months ago

import hezar is slow for some reason. We have to run a profiler on the imports and find out what's causing this and how we can fix it.

arxyzan commented 11 months ago

Running import time profiler on a simple import hezar script results the following:

image

The only solution I have in mind right now is to remove all imports in hezar's root file (__init__.py) so that users have to make imports from submodules.

# Old
from hezar import Model  # time: 1.37s

# New
from hezar.models import Model # time: 0.77s

Such change would be a major one and makes backward incompatibility issues for older codes. We must change all direct imports in our examples, snippets, notebooks, etc.

@pooya-mohammadi What do you think ? Note that 1.3 seconds on my high perfomance machine might mean over 2 seconds on most machines. If this is a real issue we have to take actions now rather than later.

arxyzan commented 11 months ago

One more thing to note: Heavy imports are usually not considered as bottlenecks since all imports run before the main operations of the whole app. One from hezar import ... makes all the upcoming imports super fast since Python caches it.

arxyzan commented 11 months ago

A workaround inspired by langchain is like this: https://github.com/hezarai/hezar/blob/optimize_imports/hezar/init.py

Importing main modules like Model, Trainer, Dataset, etc is still available but doing so would raise a warning saying that it's not recommended and will be removed soon.

UserWarning: Importing Model from hezar root is deprecated and will be removed soon. Please use `from hezar.models import Model`
arxyzan commented 11 months ago

I tried to test our examples with the new import design system and honestly, I think it's not worth it. Previously you could access everything by from hezar import ... but now you have to import anything from its submodule which is a pain.

# Old
from hezar import (
    CRNNImage2TextConfig,
    CRNNImage2Text,
    TrainerConfig,
    Trainer,
    Dataset,
    Preprocessor,
)

# New
from hezar.models import CRNNImage2TextConfig, CRNNImage2Text
from hezar.preprocessors import Preprocessor
from hezar.data import Dataset
from hezar.trainer import Trainer, TrainerConfig