uiocompcat / tmQM

tmQM dataset files
MIT License
46 stars 5 forks source link

tmQM: 2024 release

This repository contains the quantum chemical properties, including geometries, natural atomic charges and Wiberg bond orders of the 108k transition metal complexes (TMCs) in the last version of the tmQM dataset. This collection includes all 30 transition metals across the 3d, 4d and 5d series, combined with more than 30k different ligands.

108k_tmQM_Web_Figure

The tmQM dataset contains organometallic, bioinorganic, and Werner complexes. Structures were extracted from the 2024 release of the Cambridge Structural Database (CSD) with a series of filters, yielding mononuclear TMCs with charges in the range [-1, 0, 1]. Electronic structure properties, including the energy, dipole moment, polarizability, and HOMO–LUMO gap, were all computed for the closed-shell singlet state. Two levels of theory were used: GFN2-xTB (geometries) and DFT (single-point properties).

The 2024 version of the tmQM dataset is an extension of the original tmQM dataset reported in this article: The tmQM Dataset - Quantum Geometries and Properties of 86k Transition Metal Complexes. tmQM has also been used to derive a 60k graph dataset (tmQMg) and a 30k ligand library (tmQMg-L), both derived from NBO analysis.

The purpose of tmQM is to provide the scientific community with a reliable source for developing and testing machine learning models for the exploration of the TMC chemical space.

The tmQM dataset is also available for download from the UiO Computational Catalysis Group website.

tmQM data files

tmQM/tmQM_X1.xyz.gz, tmQM/tmQM_X2.xyz.gz and tmQM/tmQM_X3.xyz.gz
tmQM/tmQM_y.csv
tmQM/tmQM_X.q
tmQM/tmQM_X1.BO, tmQM/tmQM_X2.BO and tmQM/tmQM_X3.BO

old_tmQM: 2020 release

old_tmQM/old_tmQM_X1.xyz.gz and old_tmQM/old_tmQM_X2.xyz.gz
old_tmQM/old_tmQM_y.csv
old_tmQM/Benchmark2_TPSSh_Opt.xyz

Further technical details

The 2024 release of the CSD contains structural data for over 1.3M chemical compounds, of which nearly 0.5M include transition metals. However, not all 0.5M transition metal-containing structures are suited for tmQM. TMCs were thus selected and curated using these filters: