Closed salujajustin closed 3 months ago
Hello @salujajustin, thank you for your interest in our work!
If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue
Hey, @salujajustin! π I'm here to help you with any bugs, questions, or contributions you might have. Let's squash those bugs together!
To resolve the errors encountered while running the Quickstart example for the project-lighter
package, ensure that your configuration file is correctly set up and that you are using the correct command to run the experiment. Here are the steps to follow:
Ensure the Configuration File is Correct:
Make sure your cifar10.yaml
file includes both the trainer
and system
sections as shown below:
trainer:
_target_: pytorch_lightning.Trainer
max_epochs: 100
system:
_target_: lighter.LighterSystem
batch_size: 512
model:
_target_: torchvision.models.resnet18
num_classes: 10
criterion:
_target_: torch.nn.CrossEntropyLoss
optimizer:
_target_: torch.optim.Adam
params: "$@system#model.parameters()"
lr: 0.001
datasets:
train:
_target_: torchvision.datasets.CIFAR10
download: True
root: .datasets
train: True
transform:
_target_: torchvision.transforms.Compose
transforms:
- _target_: torchvision.transforms.ToTensor
- _target_: torchvision.transforms.Normalize
mean: [0.5, 0.5, 0.5]
std: [0.5, 0.5, 0.5]
Run the Command Correctly: Use the following command to run the experiment:
lighter fit --config_file cifar10.yaml
Ensure Proper Configuration Parsing:
The parse_config
function in lighter/utils/runner.py
ensures that the configuration file is correctly parsed and merged. Make sure that your configuration file is specified correctly and that any additional arguments are provided as needed. The function checks for the presence of the config_file
key and raises a ValueError
if it is not specified.
def parse_config(**kwargs) -> ConfigParser:
if "config_file" not in kwargs:
raise ValueError("--config_file not specified. Exiting.")
parser = ConfigParser()
parser.read_config(kwargs.pop("config_file"))
parser.update(pairs=kwargs)
project = parser.get("project", None)
if project is not None:
import_module_from_path("project", project)
return parser
Run the Trainer Method:
The run_trainer_method
function ensures that the trainer
and system
sections are correctly parsed and used. It sets the random seed, parses the configuration, and runs the specified Trainer method.
def run_trainer_method(method: Dict, **kwargs: Any):
seed_everything()
parser = parse_config(**kwargs)
trainer = parser.get_parsed_content("trainer")
system = parser.get_parsed_content("system")
config = parser.get()
config.pop("_meta_")
system.save_hyperparameters(config)
if trainer.logger is not None:
trainer.logger.log_hyperparams(config)
if not hasattr(trainer, method):
raise ValueError(f"Trainer has no method named {method}.")
getattr(trainer, method)(system)
By ensuring that both the trainer
and system
sections are present in your YAML configuration file, using the correct command, and ensuring proper configuration parsing, you should be able to resolve the ValueError: run ID 'run' doesn't exist in the config file
and KeyError: "id='trainer' is not found in the config resolver"
errors [1][2][3].
To continue the conversation, mention @dosu.
Did you define a trainer
in the config in addition to system
? If that's the issue, I'll add better error handling to make it more obvious
I see the issue here. The docs have a missing trainer
in the cifar10 example. I'll fix it over the weekend. @salujajustin I think you should follow the dosubot instructions 1 and 2 and hopefully this works.
@ibro45 We should use this as a chance to make better errors for our "reserved" keys.
Yes, I was confused by final yaml under Running this experiment with Lighter
when it said "We just combine the Trainer and LighterSystem into a single YAML and run the command in the terminal as shown" as it was the same just the LighterSystem
yaml config. Regardless, before posting, I did try and and combine them (by defining a trainer in addition to the system) in the same way as dosubot has suggested. However, I received the same ValueError
as the screenshot above so I thought this was more of a systemic issue. (Going to use just python 3.10 moving forward here if that's ok)
Although this time, when running it with the --pre
version I got a new error:
A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.0.0rc2 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.
If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.
Traceback (most recent call last): File "/home/justin/micromamba/envs/lighter-pre-p10/bin/lighter", line 5, in <module>
from lighter.utils.cli import interface
File "/home/justin/micromamba/envs/lighter-pre-p10/lib/python3.10/site-packages/lighter/__init__.py", line 5, in <module>
_setup_logging()
File "/home/justin/micromamba/envs/lighter-pre-p10/lib/python3.10/site-packages/lighter/utils/logging.py", line 72, in _setup_logging
rich.traceback.install(show_locals=False, width=120, suppress=[__import__(name) for name in SUPPRESSED_MODULES])
File "/home/justin/micromamba/envs/lighter-pre-p10/lib/python3.10/site-packages/lighter/utils/logging.py", line 72, in <listcomp>
rich.traceback.install(show_locals=False, width=120, suppress=[__import__(name) for name in SUPPRESSED_MODULES])
File "/home/justin/micromamba/envs/lighter-pre-p10/lib/python3.10/site-packages/monai/__init__.py", line 40, in <module>
from .utils.module import load_submodules # noqa: E402
File "/home/justin/micromamba/envs/lighter-pre-p10/lib/python3.10/site-packages/monai/utils/__init__.py", line 18, in <module>
from .deprecate_utils import DeprecatedError, deprecated, deprecated_arg, deprecated_arg_default
File "/home/justin/micromamba/envs/lighter-pre-p10/lib/python3.10/site-packages/monai/utils/deprecate_utils.py", line 22, in <module>
from monai.utils.module import version_leq
File "/home/justin/micromamba/envs/lighter-pre-p10/lib/python3.10/site-packages/monai/utils/module.py", line 30, in <module>
import torch
File "/home/justin/micromamba/envs/lighter-pre-p10/lib/python3.10/site-packages/torch/__init__.py", line 1477, in <module>
from .functional import * # noqa: F403
File "/home/justin/micromamba/envs/lighter-pre-p10/lib/python3.10/site-packages/torch/functional.py", line 9, in <module>
import torch.nn.functional as F
File "/home/justin/micromamba/envs/lighter-pre-p10/lib/python3.10/site-packages/torch/nn/__init__.py", line 1, in <module>
from .modules import * # noqa: F403
File "/home/justin/micromamba/envs/lighter-pre-p10/lib/python3.10/site-packages/torch/nn/modules/__init__.py", line 35, in <module>
from .transformer import TransformerEncoder, TransformerDecoder, \
File "/home/justin/micromamba/envs/lighter-pre-p10/lib/python3.10/site-packages/torch/nn/modules/transformer.py", line 20, in <module>
device: torch.device = torch.device(torch._C._get_default_device()), # torch.device('cpu'),
/home/justin/micromamba/envs/lighter-pre-p10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:20: UserWarning: Failed to initialize NumPy: _ARRAY_API not found (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.)
AttributeError: `np.string_` was removed in the NumPy 2.0 release. Use `np.bytes_` instead.
I'm guessing I shouldn't need to touch the code mentioned in dosubot instructions 3 & 4 right?
I don't think this is a lighter issue, I think this error has to do with the environment (numpy 2.0 seems to be pre-release, downgrade it <2).
@ibro45 While this isn't a lighter issue. I think if this is the first experience of people installing it with Python3.10, that's not good.
I will take a look into this more @salujajustin and see where the issue lies. Our CI should be adapted to keep these checks in place always.
Ah yes, I forgot that all dependencies weren't installed separately but with lighter, sorry
@salujajustin
The latest version pip install project-lighter --pre
where the version is v0.0.1a29
should run the quickstart example now. I've also updated the example on lighter docs. Keep us updated on how it looks
Looks like that worked! Thanks again @surajpaib @ibro45 for your help.
Side note, when installing with pip install project-lighter --pre
with python 3.8 I got the version 0.0.2a27
not 0.0.1a29
.
@salujajustin Thanks for checking. We've deprecated support for 3.8 now as its EOL very soon
π Bug Report
When trying to reproduce the Quickstart example using
pip install project-lighter
andpip install project-lighter --pre
, I encountered two errors. Both Python 3.8 and 3.10 environments return identical errors. Without the--pre
option specified there is aValueError: run ID 'run' doesn't exist in the config file.
and with the--pre
option there is aKeyError: "id='trainer' is not found in the config resolver."
. See the attached screenshots for more details.π¬ How To Reproduce
Steps to reproduce the behavior:
pip install project-lighter
orpip install project-lighter --pre
.cifar10.yaml
config with the provided content.lighter fit --config_file cifar10.yaml
.Code sample
Step 1:
Step 2:
or:
Step 3:
Step 4:
Environment
Screenshots
The following is the output for the lighter package from
pip install project-lighter
:The following is the output for the lighter package from
pip install project-lighter --pre
:π Expected behavior
The training should start without any errors when running
lighter fit --config_file cifar10.yaml
π Additional context
None.