Closed cvanderaa closed 3 years ago
The PR does not pass CI, but it seems the errors are not related to my changes...
Hi Chris, @cvanderaa
Thanks for looking into this. I will take a closer look later today. The first and last operations in the microbenchmark test are both using .harmonize
and the middle one is faster because it does no checking and could lead to the wrong answer if you replace the list element with an arbritary value
, e.g.,
el[[1]] <- matrix()
Best, Marcel
The first and last operations in the microbenchmark test are both using .harmonize and the middle one is faster because it does no checking
Exactly, that's what I meant! You can see that when using experiments<-
, over 90% of the time is spent in .harmonize
, and therefore I was willing to reduce that time as much as possible. I think I found a shortcut in the algorithm to spare some time without losing data validity, and this is what I suggest in this PR.
Thanks a lot for looking into this!
Hi Chris, @cvanderaa
I had a closer look at the PR and unfortunately, it is giving the wrong answer.
subset_assay
will return only the "unchanged" experiments rather than the
complete ExperimentList
.
I did add a small performance increase on the master branch: 1309679
I will close it for now. Thanks!
Best,
Marcel
Oh sorry for the bug, but thanks for the commit!
Hello Marcel, @LiNk-NY
I'm working with a
MultiAssayExperiment
object that contains many assays and I realized that any call toexperiments<-
always takes a lot of time, while working onExperimentList
object is very fast. I saw that a lot of the computational time is eaten up by.harmonize
. I here have a mock example that illustrates this, where an assay is replaced by the same assay on the data set I am working on:This is probably not the best way to assess this, but the point is that
.harmonize
is taking a dramatic proportion of the time. After some profiling of the code, I saw that themendoapply
chunk is taking around 85% of the workload.In this PR I suggest a small modification that improves the speed of
.harmonize
by 3 or 4x. I however can't run your tests because I can't figure out how to get the example data within the tests... I could successfully run the harmonization tests manually.Would this be useful to you? Do you see another way to optimize the speed even more, without loss of consistency in the data? I think some steps of the harmonization could be skipped if
.harmonize
is not called from within a subsetting function, but I'm not sure about that. I would love to have your input on this.