Open jtheorell opened 4 years ago
Hi Jakob, nice to hear from you again. That is surprising it is so slow. Could you describe your data matrix in more detail- is it a dense matrix or a sparse Matrix (and if so what is the class- dgCMatrix, etc)? What fraction of the entries are zero and how many genes are there? Also, it shouldn't be doing parallel processing unless you explicitly set mc.cores to a value larger than 1 (the default). My apologies for removing your status bar, that was not intentional. I will try to figure a way to put it back in at least for the serial processing scenario.
Hi! Thank you for your replies! It should be a sparse matrix, as I use the CPM slot in a singleCellExperiment, which upon its creation converts dense matrices to sparse ditto. There are 58565 transcripts, and the fraction of entries with zero is 94%. I did not set the mc.cores to anything but one, right, so that should not be it! Interestingly, I observed the same phenomenon when I started to work with this code: even converting from a for-loop to a lapply-scenario slowed things down considerably, and it got even worse when I tried to run things with bplapply. This might have been due to the use of dense matrices though, that it seems like the for loop has no problems with. What I had implemented before was to print "Column x of y processed" for each round, which becomes slightly tideous to look at, but that very clearly shows what is going on and is simple enough to implement.
Hi! Using the quminorm function again now in its latest form, and it seems like the timer (or rather the continuous progress report) that I threw in earlier has had to go, probably when making it all parallelized. With my data, it seems like the process is very slow in the new version (I had to cancel it after 5 minutes for ~2000 cells), and regardless I find it very hard to deal with processes that go under the hood without any progress reports for extended periods of time. I therefore have two suggestions (if you do not want to revert to the previous non-parallelized version which takes 2 minutes for the same dataset):
Best J