zalando / expan

Open-source Python library for statistical analysis of randomised control trials (A/B tests)
MIT License
331 stars 50 forks source link

custom deepcopy() method for 'StatisticalTest', to save some memory #230

Closed aaron-mcdaid-zalando closed 6 years ago

aaron-mcdaid-zalando commented 6 years ago

In another product, we call deepcopy on lists of StatisticalTest. As a result, we might waste some memory copying dataframes

My understanding is that, within StatisticalTest, the dataframe is not changed. Therefore, it is not necessary to copy it. Is this correct? Is .data "immutable" within StatisticalTest

coveralls commented 6 years ago

Coverage Status

Coverage decreased (-0.09%) to 92.376% when pulling 4f8bc99f3d8f9b81bae2cffcbe18fd24e1e9a509 on aaron-mcdaid-zalando:do.not.deepcopy.dataframes into f3b839017db9e06eba12eaa67e4d3a7022e29fde on zalando:master.

daryadedik commented 6 years ago

why we need that? how much memory you expect to save by this? (I mean kpi, features, variants are usually very small objects). Or did I misunderstood smth?

aaron-mcdaid-zalando commented 6 years ago

The data object isn't small though, that is what I'm changing. I'm not making any change to kpi, features, variants.

daryadedik commented 6 years ago

"We should not 'deepcopy' the pandas Dataframe. The dataframe is essentially immutable for the purposes of this. "We'll 'deepcopy' everything else though." - this is from your comment, why we need to deepcopy that?

we didn't have deepcopy of data before, or, we had that somewhere in expan service?

gbordyugov commented 6 years ago

I think that the idea is quite good.

The only hypothetical objection would be that we're sticking to the idea of carrying around data frames, which I think breaks the separation of the pure logical description of a test and its actual execution.

aaron-mcdaid-zalando commented 6 years ago

@daryadedik, ExpanService.py does a deepcopy of the StatisticalTestSuite. Although, maybe we should simply remove that deepcopy and simplify things there, and therefore drop this PR. What do you think?

daryadedik commented 6 years ago

@aaron-mcdaid-zalando ah, we had a deepcopy in expan service, now I get your pull request :) I will think about it, but now we can keep it. Sorry for many comments.

igusher commented 6 years ago

I don't understand, why you even mention ExpanService here? You are developing self-sufficient, standalone library, focus on the API, which this lib should provide :)

aaron-mcdaid-zalando commented 6 years ago

@igusher , maybe this suggests that the boundary between expan and our internal Zalando tools is in the wrong place!

It is appropriate to define special methods, such as __repr__ and __len__, in this repo. Therefore, __deepcopy__ is also suitable in this repo.

aaron-mcdaid-zalando commented 6 years ago

... but I'm not saying this in order to advocate again for this PR. I'm just observing that I'm confused about what design decisions were made when designing this API

shansfolder commented 6 years ago

@aaron-mcdaid-zalando it looks good to me. What is still in progress though?