Closed PicoCentauri closed 3 weeks ago
I don't think this is what we want for #194. I constantly see very bad error messages when not using --debug
. It's not just the ASE reader. The issue is that the error wrapper that we have cuts off all the stack trace and shows only the last line (or few lines). These changes are good, but they can't fix the underlying issue. I can show more examples of terrible errors if needed
Can you give another example?
This PR is improving the error message you saw. What would you like to have for example for this error?
Sure:
$ mtt train options.yaml
[2024-06-10 20:48:47][INFO] - This log is also available in 'outputs/2024-06-10/20-48-47/train.log'.
[2024-06-10 20:48:48][INFO] - Random seed of this run is 2490997340
[2024-06-10 20:48:48][INFO] - Setting up training set
[2024-06-10 20:48:48][WARNING] - No Forces found in section 'energy'. Continue without forces!
[2024-06-10 20:48:48][WARNING] - No Stress found in section 'energy'. Continue without stress!
[2024-06-10 20:48:48][INFO] - Setting up test set
[2024-06-10 20:48:48][INFO] - Setting up validation set
[2024-06-10 20:48:48][INFO] - Setting up model
The error below most likely originates from an architecture. If you think this is a bug, please contact its maintainer (see the architecture's documentation).
'NoneType' object does not support item assignment
mtt train options.yaml
[2024-06-10 20:52:07][INFO] - This log is also available in 'outputs/2024-06-10/20-52-07/train.log'.
[2024-06-10 20:52:08][INFO] - Random seed of this run is 2960232964
[2024-06-10 20:52:08][INFO] - Setting up training set
Interpolation key 'systems.read_from' not found
full_key: [0].targets.energy.read_from
object_type=dict
The point is that IMO cutting out the stack trace is a bad choice, at least at this point in the development of metatrain. Sure, if we become famous with thousands of users and we have 20 developers, then we can have a custom 1-line error for everything. But for now, I would keep the stack trace
I mean this looks like some issues inside the bpnn.
I disagree. If you are not a developer giving a traceback doesn't really help. I mean how often do you read the traceback of a compiler error and you have no idea what is going on.
But, I am open to discuss.
$ mtt train options.yaml
[2024-06-11 07:37:51][INFO] - This log is also available in 'outputs/2024-06-11/07-37-51/train.log'.
'<' not supported between instances of 'str' and 'int'
Why don't we just remove the custom handling of errors for now? These errors are crap
We could take inspiration on sphinx, and save the whole tracback to a file, and show the path to the file in the user-facing error message. Something like
Unexpected error occurred. This most likely originates from the 'XXX' architecture.
If you think this is a bug, please contact its maintainer (see the architecture's documentation),
and include the full log at /tmp/nqfiouabspvhqbfod/error.log
We could also try to include the last line of the error, although this won't be helpful most of the time as we see
Also a good idea. However, we already have log file which contains the same as the stdout/stderr and opening another might be too much? We can however to an output log and an error log, where the latter contains the error including the traceback...
After talking to @frostedoyster, I suggest the following
logging.error(
f"{traceback.format_exc()}\n"
"If the error message below is unclear, please help us improve it by "
"opening an issue at https://github.com/lab-cosmo/metatrain/issues. "
"Thank you!\n\n"
f"{err}")
This will log the error message to stdout and file on the bottom which is directly visible to the user; Above an info helping us improving the message and above the traceback. This would look like for the current issue.
[2024-06-11 13:46:04][INFO] - This log is also available in 'outputs/2024-06-11/13-46-04/train.log'.
[2024-06-11 13:46:04][INFO] - Random seed of this run is 951776256
[2024-06-11 13:46:04][INFO] - Setting up training set
[2024-06-11 13:46:05][ERROR] - Traceback (most recent call last):
File "/Users/philiploche/repos/lab-cosmo/metatrain/.tox/tests/lib/python3.12/site-packages/metatrain/__main__.py", line 119, in main
train_model(**args.__dict__)
File "/Users/philiploche/repos/lab-cosmo/metatrain/.tox/tests/lib/python3.12/site-packages/metatrain/cli/train.py", line 189, in train_model
train_targets, target_info_dictionary = read_targets(
^^^^^^^^^^^^^
File "/Users/philiploche/repos/lab-cosmo/metatrain/.tox/tests/lib/python3.12/site-packages/metatrain/utils/data/readers/readers.py", line 192, in read_targets
blocks = read_energy(
^^^^^^^^^^^^
File "/Users/philiploche/repos/lab-cosmo/metatrain/.tox/tests/lib/python3.12/site-packages/metatrain/utils/data/readers/readers.py", line 51, in read_energy
return _base_reader(
^^^^^^^^^^^^^
File "/Users/philiploche/repos/lab-cosmo/metatrain/.tox/tests/lib/python3.12/site-packages/metatrain/utils/data/readers/readers.py", line 33, in _base_reader
return reader(filename, dtype=dtype, **reader_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/philiploche/repos/lab-cosmo/metatrain/.tox/tests/lib/python3.12/site-packages/metatrain/utils/data/readers/targets/ase.py", line 30, in read_energy_ase
raise KeyError(
KeyError: "energy key 'hello' was not found in system 'qm9_reduced_100.xyz' at index 0"
If the error message below is unclear, please help us improve it by opening an issue at https://github.com/lab-cosmo/metatrain/issues.Thank you!
"energy key 'hello' was not found in system 'qm9_reduced_100.xyz' at index 0"
What do you think?
Ok, I would save the stack trace to a file (very good idea) and remove it from the logs, where we can have the last line + the suggestion of "if this is unclear, please report it, help improve it and include the full error at
Thanks for your feedback. The current output is
$ mtt train foo
(tests) (base) philiploche@tsf-444-wpa-0-067:~/repos/lab-cosmo/metatrain$ mtt train foo
[2024-06-12 16:08:01][INFO] - This log is also available at '/Users/philiploche/repos/lab-cosmo/metatrain/outputs/2024-06-12/16-08-01/train.log'.
[2024-06-12 16:08:01][ERROR] - If the error message below is unclear, please help us improve it by opening an issue at https://github.com/lab-cosmo/metatrain/issues.
When opening the issue, please include the full traceback log from '/Users/philiploche/repos/lab-cosmo/metatrain/outputs/2024-06-12/16-08-01/error.log'. Thank you!
[Errno 2] No such file or directory: '/Users/philiploche/repos/lab-cosmo/metatrain/foo'
the error.log
file contains the full traceback.
@Luthaf had a good idea adding the architecture_name
to the architecture error. I will open an issue and do this in another PR.
Fixes #194
The default ase error message of unknown keys is very unhelpful for us. I improved the error message for ase readers. With these options
The error is
Contributor (creator of pull-request) checklist
📚 Documentation preview 📚: https://metatrain--242.org.readthedocs.build/en/242/