Open jdkent opened 3 years ago
This may be related to #248.
EDIT: To clarify- I'm not arguing against an Annotator class. I just want to take some time to think more about it before I weigh in, and there may be useful info in that old issue.
The Annotator class would take in a Dataset and output an Annotation object (or just a dataframe?).
I think a new class would be necessary, since some Annotators (especially GCLDA) would produce multiple outputs. In GCLDA's case, we would have arrays (or DataFrames) of (1) probability of term given topic, (2) probability of topic given study, and (3) probability of voxel given topic.
The plan in #248 was to have Annotators act like Transformers- i.e., the Annotator would take in a Dataset and return a Dataset with an updated annotations
attribute. Unfortunately, that suffers from the same limitation as just returning a DataFrame.
With #606 I have a strong motivator to write an Annotator class, but I'm still not sure how we should incorporate topic-word (LDA & GCLDA) and topic-voxel (GCLDA) arrays into the Dataset, or what the alternative Annotation class would look like. @jdkent, any thoughts? Has the neurostore team done anything with Annotations in the API?
Per discussion on Slack, we could just stick these distributions (whether as arrays or DataFrames) into an attribute that we don't work too hard to standardize. It will basically mean that we assume no tools except NiMARE will use these "extra" distributions.
Something like
pprint(Annotator.distributions_)
{
"p_topic_g_token": numpy.ndarray,
"p_topic_g_token_df": pandas.DataFrame
}
WDYT?
Summary
There are variables associated with a study that are always the same such as
sample_size
, however, certain variables may depend on the dataset the study is part of, like results from topic modeling.Annotations should be separate from the Dataset object to help keep Datasets immutable.
Additional details
Next steps
annotate
module would inherit from the Annotator Class.(the definition of the Annotation objects will eventually live in the neurostore API client library, which could then be pushed/pulled from neurostore)