Open dracodoc opened 7 years ago
I just found the vignettes already mentioned that sometimes you need more than averaging. This confirmed my idea that the ca
name is not best. And I found scatter
have some random shuffle meaning inherent so it's a good word for this case.
Agree that the names could be improved.
I'll suggest to use underscore, and use common prefix like stringr. So all functions will be like filexx, dis, sa_xx, or even just f_xx, d_xx.
To be clear, does this mean ca,cabase,calm,caglm,caprcomp
become ca_, ca_base, ca_lm, ca_glm, ca_prcomp
, etc.?
Yes, I didn't add the 'ca' example because I think ca is not the best representation of software alchemy. "Software alchemy" is not easy to understand or relate either.
Changing from 'ca' to 'sa' is a good idea. We can do that easily without breaking users' old partools code by simple assignments, e.g. salm <- calm.
I agree that the lack of separators like '_' may be difficult for a non-native speaker of English at first, but I would be reluctant to break users' existing code.
Software alchemy is really for means, including proportions, and is not appropriate for something like fetching the top 10 values of a variable. However, one can use partools in other ways. Actually, I was just the other day thinking about writing a convenience function for that.
As to Divide and Combine, see my 2016 JSS paper, which is referenced both in the man page and the vignette.
The points I raised here could be just personal taste, and it might be quite cumbersome to change names, but I think it's better discussed earlier than later.
I found some names in packages a little bit confusing:
ca
as core ofSoftware Alchemy
. I expectedsa
for this since chunk averaging is seldom mentioned. For averaging, isn't it possible sometimes we need something different, like getting max 10 values from all data? That's a typical Hadoop example, butSoftware Alchemy
can handle it as well. In the other hand, I findSoftware Alchemy
itself didn't tell user what it is compare toDivide and Combine
. Maybe you can also call itscatter compute
.filesplitrand
, ther
inside it especially easy to be overlooked.calm
is difficult to be read asca lm
. I'll suggest to use underscore, and use common prefix likestringr
. So all functions will be likefile_xx
,dis_
,sa_xx
, or even justf_xx
,d_xx
.