Implement loglikelihood

Description

Implement modular log-likelihoods coming from different data sources.

Key aspects:

Loss function is the sum of the modular log-likelihood per dataset source or measurement class
We will predict dG/KT, but we can discretize in inactives / actives later if needed.
Unknowns. These could be nuisance parameters in the future, but for now:
- [substrate]: we could estimate by cross-validation, use relative pIC50s, or add that as a nuisance parameter in the future
- uncertainties: estimate per dataset (like Kramer's paper) or measurement class
We will start by taking the intersection of ChEMBL/KinomeScan.
Featurization does not need to be too advanced now. The idea here is to implement the framework only and see how it looks in the API.

[ ] Implement a draft notebook that covers the toy problem: "Let's assume there's a subset of Chembl also present in KinomeScan. We can train on one, predict on the other one, on both, and/or all combinations. What's the error on our reported estimates? What's the accuracy / sensitivity?"
[ ] Add new attributes to relevant objects
[x] Evaluate the need for DatasetCollection or MetaDataset object