Initial implementation of ID assignment + molecule validation

Description

Initial implementation, closes #8

This PR adds validate_and_assign_ids.py, as well as a stub for cli.py (which should probably be overwritten by the one that @dotsdl is writing). It also adds a ton of tests in test_validate_and_assign_ids.py and some test utilities in tests/utils.py.

The major work here is in the validate_and_assign function.

This takes as input:

input_graph_files: A list of SDF files to load containing 3D molecules. The coordinates from these molecules will be stripped, and only the molecule graph will be considered. New conformers for these molecules will be generated in subsequent steps. The same chemical species MUST NOT appear multiple times in this list of files, and an error will be raised if this is violated.
input_3d_files: A list of SDF files to load containing 3D molecules. The same chemical species MAY appear multiple times in this list of files, and each conformer provided in this manner will replace a conformer that would otherwise be generated in a subsequent step.
output_directory: optional, default=1-validate_and_assign. The directory that will hold the output of this workflow step. If this directory already exists, an Exception will be raised, prompting the user to manually delete the existing directory if they really intend to run the step again.
group_name: The three-character code for this dataset. Actually accepts any string.

Note: For now, it's permissible for the same molecule to appear in input_graph_files and input_3d_files.

This produces as output:

3D SDF files following the pattern <group_name>-<5-digit molecule ID>-<2-digit conformer ID>.sdf, for example JRW-00004-00.sdf.
- These files will have their atoms indexed identically.
- The numerical components of the file name are indexed beginning at zero.
- The files contain SD data pairs for the information in the file name -- group_name, group_id, and conformer_index
Mapped, isomeric, explicit-hydrogen SMILES, following the pattern <group_name>-<5-digit molecule ID>.smi, for example JRW-00004.smi.

Questions

What to do if user inputs more than 10 confs? No logic for this currently.

Status

[x] Fill out test input molecules
[ ] ~Test enumerating stereoisomers(?)~ This doesn't enumerate stereoisomers
[x] Implement tests
- [x] Test loading all inputs in data/molecules
  - [x] good input with a single molecule
  - [x] good input with multiple molecules
  - [x] ~bad~ good input with repeated molecules, which get assigned different conformer IDs
  - [x] bad 2d molecule (just don't accept 2D sdf at all for now) ~with defined stereo~
  - [ ] ~bad 2d molecule with undefined stereo --> enumerate stereoisomers?~
- [x] Test NOT overwriting existing outputs
[x] Name file as <groupID>-<moleculeIndex>-<conformerID>.sdf
[x] File's SD data should include a field for each piece of information in the name.
[x] Ready for review

openforcefield / openff-benchmark