This repository contains the quantum chemical properties, including geometries, natural atomic charges and Wiberg bond orders of the 108k transition metal complexes (TMCs) in the last version of the tmQM dataset. This collection includes all 30 transition metals across the 3d, 4d and 5d series, combined with more than 30k different ligands.
The tmQM dataset contains organometallic, bioinorganic, and Werner complexes. Structures were extracted from the 2024 release of the Cambridge Structural Database (CSD) with a series of filters, yielding mononuclear TMCs with charges in the range [-1, 0, 1]. Electronic structure properties, including the energy, dipole moment, polarizability, and HOMO–LUMO gap, were all computed for the closed-shell singlet state. Two levels of theory were used: GFN2-xTB (geometries) and DFT (single-point properties).
The 2024 version of the tmQM dataset is an extension of the original tmQM dataset reported in this article: The tmQM Dataset - Quantum Geometries and Properties of 86k Transition Metal Complexes. tmQM has also been used to derive a 60k graph dataset (tmQMg) and a 30k ligand library (tmQMg-L), both derived from NBO analysis.
The purpose of tmQM is to provide the scientific community with a reliable source for developing and testing machine learning models for the exploration of the TMC chemical space.
The tmQM dataset is also available for download from the UiO Computational Catalysis Group website.
The 2024 release of the CSD contains structural data for over 1.3M chemical compounds, of which nearly 0.5M include transition metals. However, not all 0.5M transition metal-containing structures are suited for tmQM. TMCs were thus selected and curated using these filters:
Chemical composition filter: Mononuclear TMCs including any of the 30 transition metals bonded to any of these elements: B, C, Si, N, P, As, O, S, Se, F, Cl, Br, and I, and including at least one C atom.
Geometry filters (I): Non-polymeric and with 3D coordinates available, excluding disordered structures.
Geometry filters (II): Heaviest fragment with a transition metal; excludes co-crystalizing molecules (e.g., solvents and counterions).
Electronic structure filters: Neutral and ±1 charged TMCs with an even number of electrons.
Curation filter (I): Only the TMCs for which both the xTB and DFT calculations converged.
Curation filter (II): The 7% TMCs with the largest deviation of the xTB geometry relative to the CSD structure were excluded (normalized by r factor and number of atoms).
Curation filter (III): xTB-optimized geometries without any H atom, or with C atoms with missing Hs, were excluded using the filter developed by Ulissi and Blau in this article.
Curation filter (IV): xTB-optimized geometries with missing Hs, as well as dissociated, isolated ligands, were excluded using a C-focused multi-radii geometric filter.