What is the best way to deal with multiple data classes?

seqasim commented 5 years ago

I am writing up a Precession analysis class. This is a subclass of two different kinds of Data classes (MODAL, Spikes).

Currently, it seems like SubjectDataBase assumes you will only be dealing with one kind of data (i.e. iEEG). So, load_data() defaults to assigning the outputs of compute_data() to a single attribute: self.subject_data. However, I have multiple different Data Classes (MODAL data, Spike Data) that I want the subject to inherit.

So, is it possible to modify the load_data() function in SubjectDataBase to alter the name of the assigned attribute depending on the specific DataClass being used to compute_data()? This way we can store multiple kinds of data?

Or is this a modification that I would make in the Precession analysis class? I'm not sure how that's possible to do, though - if it inherits both the Modal and Spike classes, self.subject_data would presumably get overridden by one of the data sets, I assume.

jayfmil commented 5 years ago

So it's one analysis that inherits two different types of data? I hadn't really thought of that, but it does seem like it'd be useful.

I guess I would override load_data() by making a load_data() method in your analysis class. And then you could have the new load_data call each of the data class load_data methods individually. To avoid overwriting self.subject_data, this load_data could set each data to a different variable, or it could combine them in some way.

Probably would also have to override compute_data and save_data in a similar way.

seqasim commented 5 years ago

Yeah - this will come in handy with any spike-LFP analysis that we end up doing.

Ok, I think this makes sense, will try it and report back.

jayfmil commented 5 years ago

Sounds good. I haven't tested this or anything, but maybe something along these lines.

class SubjectNewAnalysis(SubjectAnalysisBase, SubjectDataA, SubjectDataB):

    def load_data(self):
        super(SubjectDataA, self).load_data()
        tmp_data = {'data_a': self.subject_data}
        super(SubjectDataB, self).load_data()
        tmp_data['data_b'] = self.subject_data
        self.subject_data = tmp_data

Also you probably have to be careful to make sure that your different data classes don't have conflicting parameter names.

seqasim commented 5 years ago

What do you mean by conflicting parameter names? Do you mean 'data_a' should not = 'data_b'

jayfmil commented 5 years ago

I just mean that it could get confusing if both data classes have a 'start_time' attribute, for example.

seqasim commented 5 years ago

Btw this works well. I think this might be the most parsimonious way to do this for all/most cases

seqasim commented 5 years ago

Sounds good. I haven't tested this or anything, but maybe something along these lines.
class SubjectNewAnalysis(SubjectAnalysisBase, SubjectDataA, SubjectDataB):

    def load_data(self):
        super(SubjectDataA, self).load_data()
        tmp_data = {'data_a': self.subject_data}
        super(SubjectDataB, self).load_data()
        tmp_data['data_b'] = self.subject_data
        self.subject_data = tmp_data
Also you probably have to be careful to make sure that your different data classes don't have conflicting parameter names.

I want to bring up an important issue here actually - because 'SubjectDataA' is the first DataClass inherited by the analysis, important attributes related to loading the data belong only to that class. So in the example provided, you load the same data twice because super() does not "know" the save_file attribute that should be associated with 'SubjectDataB'.

As an alternative, may have to do this:

subj = SubjectDataA(...)
subj.load_data() 
tmp_data = {'A': subj.subject_data}
subj = SubjectDataB(...)
subj.load_data() 
tmp_data['B'] = subj.subject_data
self.subject_data = tmp_data

jayfmil commented 5 years ago

Good catch. I like the idea to update the save path within load_data. Have you tried it / does it work correctly?

On Thu, Dec 6, 2018 at 12:48 PM seqasim notifications@github.com wrote:

Alternatively, in each DataClass' load_data method, I might add the following line:

python def load_data(self): """ Call super's load data, and then additionally cast data to float32 to take up less space. """ self._update_save_path() super(SubjectDataA, self).load_data() if self.subject_data is not None: self.subject_data = self.subject_data.astype('float32') self.elec_info = RAM_helpers.load_elec_info(self.subject, self.montage, self.bipolar) # this should not vary between sessions so keep it as is.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jayfmil/miller_ecog_tools/issues/5#issuecomment-445024294, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOirG9uHg8zhJlOWdC6Lh3v6SEhMzAIks5u2YKHgaJpZM4Y6SL3 .

jayfmil / miller_ecog_tools

What is the best way to deal with multiple data classes? #5