Unify backward reaction models from `search` and `reaction_prediction`

AustinT commented 7 months ago

Following discussion from #58, this PR unifies the base reaction model class from search and reaction_prediction, finally allowing all code to use a single reaction model base class in all places.

The new class lives in syntheseus/interface/models.py and is mainly an extended version of the earlier ReactionModel class to include:

caching (disabled by default)
de-duplication of reactions (enabled by default)
call signature where num_results is optional and defaults to 100
tracking number of calls
a reset function to clear the cache / call count

The following modifications were made to use the new class:

All pointers in the search code to search.reaction_models.BackwardReactionModel were redirected to interface.models.BackwardReactionModel
Tests of caching / de-duplication which previously lived under tests/search/* were moved to tests/interface/test_models.py
All models in reaction_prediction were modified to override _get_reactions instead of __call__ (otherwise it would avoid interacting with the cache)
Deleted the syntheseus wrapper class (no longer necessary). Search CLI now just directly uses the single-step model created from config.
Added a warning to search algorithms if reaction model caching is switched off. Also ensured caching is turned on for all models used during the tests in tests/search/*

After this, search/reaction_models still contained code for toy reaction models. I thought this was a bit out of place so I moved it to reaction_prediction/inference/toy_models.py and included the word "toy" in the class names to emphasize that these are toy models. I also moved the tests. This allowed search/reaction_prediction to be deleted entirely.

AustinT commented 6 months ago

Looks good so far, but we also need to adapt the "Quick Start" tutorial

My understanding was that if we change the docs it will be updated immediately. Therefore I thought it would make the most sense to update the tutorials when the new release (0.4.0) is ready. This could happen in a separate PR once this PR is merged and the issue with single-step eval is fixed. Does that sound like a good plan @kmaziarz ?

kmaziarz commented 6 months ago

My understanding was that if we change the docs it will be updated immediately. Therefore I thought it would make the most sense to update the tutorials when the new release (0.4.0) is ready.

Hmm, yeah, if one follows the tutorial using 0.3.0 (instead of installing from source) then we shouldn't change it before we release the new version... I guess we needed versioned docs after all 😄

AustinT commented 6 months ago

My understanding was that if we change the docs it will be updated immediately. Therefore I thought it would make the most sense to update the tutorials when the new release (0.4.0) is ready.

Hmm, yeah, if one follows the tutorial using 0.3.0 (instead of installing from source) then we shouldn't change it before we release the new version... I guess we needed versioned docs after all 😄

I think it's fine, especially since the docs are fairly minimal. We can set it up later once the docs are a bit better.

kmaziarz commented 6 months ago

On the topic of get_results in eval_single_step.py, it now repeats the logic for deduplicating the outputs; I think instead of having skip_repeats be an argument to get_results we should just pass the respective config option when creating the model via get_model.

Digging this up, I think it's actually more than duplicated logic: the skip_repeats argument in eval_single_step.py will turn the deduplication logic in the script on/off, but that will have no effect due to deduplication already being turned on in the model itself. I think we can again make use of get_model accepting arbitrary kwargs and do

model = get_model_fn(config, remove_duplicates=config.skip_repeats)

After that we can get rid of the deduplication logic from eval_single_step.py entirely.

AustinT commented 6 months ago

After that we can get rid of the deduplication logic from eval_single_step.py entirely.

Good point, I will make this change shortly (just going for lunch now).

AustinT commented 6 months ago

After that we can get rid of the deduplication logic from eval_single_step.py entirely.

I pushed another commit which does this. What do you think? @kmaziarz?

microsoft / syntheseus

Unify backward reaction models from `search` and `reaction_prediction` #73