Bug fixes and improvements

Main changes:

Rely on rxn-chemutils, which makes a few of the points below possible.
Support of actual fragment bonds. Before this PR, ~ was sometimes accepted, but only when such compounds were parsable with RDKit. Things like [Na+]~[H-] were raising an Invalid Valence error. This PR fixes this.
Improved support of extended SMILES notation. There was some code to handle reaction SMILES with |f:0.1,2.3|, but such examples usually failed. This now works, and the mapped reaction SMILES will also adopt this format if the input was in this format.
Specifying canonicalize_rxns=False will now do as few changes to the input SMILES as possible. I.e., it will also keep the original ordering of the atoms more consistently (but not in 100% of cases).
Specifying canonicalize_rxns=False will now not fail for SMILES with invalid valence, such as CFC.
Additional tests.
Other minor fixes.

In general, the ordering of the compounds in the mapped reaction SMILES may change compared to before this PR (with identical confidence). Sometimes, also the order of the fragments changes (see example below) - in this case, the input to the actual transformer model will be different, and hence also the output / confidences may be different.

Differences on a subset of 10 reactions from USPTO

As an example, taking the 10 first reactions from the test set from https://github.com/rxn4chemistry/OpenNMT-py/tree/carbohydrate_transformer/data/uspto_dataset:

rxn
CCCI.CN(C)C=O.O=C([O-])[O-]~[K+]~[K+].c1ccc2[nH]cnc2c1>>CCCn1cnc2ccccc21
CC(C)(C)P(Cl)C(C)(C)C.ClCCl.O>>CC(C)(C)[PH](=O)C(C)(C)C
C1CN2CCN1CC2.CC(C)=O.ClCBr>>ClC[N+]12CCN(CC1)CC2
CC(C)(C)[O-]~[Na+].CCn1c(Br)nc2ccccc21.CNc1ccccn1.Cc1ccccc1>>CCn1c(N(C)c2ccccn2)nc2ccccc21
CS(C)=O.Cc1cccc(CBr)n1.O.[K+]~[OH-].c1ccc(Nc2ccccn2)nc1>>Cc1cccc(CN(c2ccccn2)c2ccccn2)n1
Brc1ccccn1.Cc1cccc(Nc2cccc3ccc(C)nc23)n1.O.O=C([O-])[O-]~[Na+]~[Na+].[Br-]~[K+].[Cu]>>Cc1cccc(N(c2ccccn2)c2cccc3ccc(C)nc23)n1
CC(=O)Oc1ccc(-c2cnc(N(S(=O)(=O)c3ccc([N+](=O)[O-])cc3)S(=O)(=O)c3ccc([N+](=O)[O-])cc3)c(Cc3ccccc3)n2)cc1.CO.[Na+]~[OH-]>>O=[N+]([O-])c1ccc(S(=O)(=O)Nc2ncc(-c3ccc(O)cc3)nc2Cc2ccccc2)cc1
CCOC(=O)CCCCCBr.CN1CCc2c([nH]c3ccccc23)C1.[H-]~[Na+]>>CCOC(=O)CCCCCn1c2c(c3ccccc31)CCN(C)C2
CCOC(C)=O.CN(C)C=O.CO.COC(=O)CCn1c2ccccc2c2ccccc21.C[O-]~[Na+].Cl~NO.O.O=C([O-])O~[Na+]>>O=C(CCn1c2ccccc2c2ccccc21)NO
CC#N.Nc1ccc(C(F)(C(F)(F)F)C(F)(F)C(F)(F)F)cc1.O=C1CCC(=O)N1Cl>>Nc1ccc(C(F)(C(F)(F)F)C(F)(F)C(F)(F)F)cc1Cl

Before the PR, this failed because of the reaction containing [H-]~[Na+]. After the PR, all is successful.

On the nine remaining actions, there are some differences in the predicted confidences. This is because some of the fragments are ordered differently in the final reaction SMILES after canonicalization. This is because the PR does the canonicalization on the species containing the dot, while it did so using the tilde before the PR:

>>> MolToSmiles(MolFromSmiles('[OH-]~[K+]'))
'[OH-]~[K+]'
>>> MolToSmiles(MolFromSmiles('[OH-].[K+]'))
'[K+].[OH-]'

The effect on the nine remaining reactions is the following:

Mapped RXN SMILES: all are equivalent according to process_reaction_with_product_maps_atoms.

Confidences (left: old, right: new):

0.9785982166 0.9785982166
0.2140628880 0.2140628880
0.3482984899 0.3482984899
0.9714398526 0.9714398526
0.7309175528 0.6944778058
0.9868994112 0.9866684536
0.2510597188 0.2471753442
0.8377841303 0.8302324685
0.0532848520 0.0532848520

(i.e. the 5th, 6th, 7th are different - due to the ordering of fragments mentioned above).

If the ~ are replaced by . for those nine reactions, the results from before and after the PR have an equivalent mapping and identical confidences (but the ordering of the mapped rxn may be different).

rxn4chemistry / rxnmapper

Bug fixes and improvements #24

Differences on a subset of 10 reactions from USPTO