xiaoruiDong / RDMC

Reaction Data and Molecular Conformers (RDMC) is a package dealing with reactions, molecules, conformers, majorly in 3D.
https://xiaoruidong.github.io/RDMC/
MIT License
22 stars 1 forks source link

Interpret multiplicity from reaciton smiles #26

Closed hwpang closed 2 years ago

hwpang commented 2 years ago

Interpret multiplicity from reaction smiles for consistency. Open for discussion on how to handle when reactant multiplicity != product multiplicity (this occurs for carbene/nitrene)

xiaoruiDong commented 2 years ago

@hwpang , @kspieks Thanks for bringing it up. In my opinion, there needs to be

hwpang commented 2 years ago

@xiaoruiDong Thanks for the comments! I will work on this soon!

alongd commented 2 years ago

Hi, if I may comment, I think that generally the multiplicity is conserved across a reactions (it's called an "adiabatic" reaction), other than special cases where we have inter-surface crossing.

For example, the multiplicity of C=C + C=C <=> C[CH2] + [CH]=C is 1 for the reactants, and 1 for the products as well: the two products have opposite radical spins (they are entangled), so overall it is 1.

For C1CC1 = [CH2] + C=C, I think that the [CH2] species is a singlet carbene, so it's multiplicity is also 1.

There are definitely issues with determining the correct multiplicity for reactions from the SMILES, but I don't think that the "reactant multiplicity != product multiplicity" case is important for "conventional" (adiabatic) reactions (only relevant for inter-surface crossing reactions).

xiaoruiDong commented 2 years ago

@alongd Thanks for the comment. I should have made myself clearer. The reactions we are considering are multiplicity-reserved reactions. The major concern here is that if the user doesn't supply the multiplicity information but only provides SMILES, what are we gonna do. Hao-Wei and Kevin are suggesting parsing from SMILES, so the potential issue will be

  1. the molecule parsed from SMILES using RDkit may have undesired (at least for some cases) multiplicities. E.g., [CH2], RDKit always makes a molecule with a multiplicity of 3. More generally, it always doesn't try to match possible electron pairs. E.g., when parsing a SMILES complex C[CH2].[CH]=C, the complex generated will have a multiplicity of 3.
  2. Due to 1, we can easily run into cases where generated reactant complex and product complex have different multiplicity. E.g., C=C + C=C <=> C[CH2] + [CH]=C. The reactant C=C.C=C will resulted in a complex with a multiplicity of 1, while the product generated from RDKit will have 3. Since they are different, it is not determinisitic about what multiplicity to use when launching TS jobs. The previous proposal is using reactant's multiplicity, however, the user may be unaware of what multiplicity he is actually using, and the number used may not be the one he actually wants to use. E.g, [O][O] + C[CH2] = [O]O + C=C, if only use the reactant multiplicity it will be 4 (generated from RDKit), but he may actually want to use 2.
  3. So I propose. It is always best if the user knows what multiplicity to use. But if he doesn't, we can have an option to decide if the larger or the smaller one is used whenever there is a multiplicity-inconsistency. There can also be cases like [CH3] + [O] = [CH2] + [OH], both sides will be parsed as multiplicity 4 molecule complexes, but the actual one the user want to use is 2, we need to decide if we want to fix it. So, in my opinion, we need to detailedly solve the potential issues listed in 3.
alongd commented 2 years ago

Hi, thanks for the detailed explanation, and sorry for barging in... When users define species, can they specify the multiplicity, or can they only give SMILES/InChI? You are right that cases like (3) are problematic. For the [CH2] + [OH] = [CH3] + [O] reaction we get different products for the different multiplicities (singlet or triplet [O]). Since both reactions are feasible, there's no real way to know what was originally meant from just SMILES. So perhaps there should be a way to specify in advance the multiplicity of species, especially ones like [O] or [CH2]?

hwpang commented 2 years ago

@xiaoruiDong @alongd Thanks for the valuable discussion! We have changed the code as suggested by Xiaorui, where the user can specify the multiplicity of the reaction. If not specified, the multiplicity will be interpreted from the reaction smiles. If the interpreted multiplicity is inconsistent between the reactants and the products, it will choose the multiplicity based on the option use_smaller_multiplicity.