Closed mdshw5 closed 8 years ago
hi Matt,
Thanks for filing this issue and your contributions to tximport.
I think I'm going to keep the codebase as is (not incorporate this change) because for me this bug is really an important feature. I think in 99% of the cases, users should not be running DE tools on quantifications produced using different indices. I think, if you add transcripts to the index, you really need to re-quantify the old files using the same index, as a new index adds potential technical artifacts for comparison (changing the optimum reached by the EM). Luckily these quantifiers only take a few minute per sample, so even for massive projects, this takes 1-2 days. I suppose there could be an edge case where the files are quantified using the exact same index, but an index which was produced by resorting the transcript order. But this sound fishy and unlikely. I think the current check that stopifnot(all(txId == raw[[txIdCol]]))
is basically a realistic check that you're not accidentally comparing files which were quantified with a different index.
All this being said, the solution if you really want to combine tximport-ed matrices across different indices is much simpler in my opinion if you do this after running tximport:
idx <- intersect(rownames(txi1$counts), rownames(txi2$counts))
counts.new <- cbind(txi1$counts[idx,], txi2$counts[idx,])
I'd prefer this post-hoc approach, so that users don't expect tximport will take care of files processed using a different index automatically.
Yeah, I think this is reasonable. Thanks for your time, and the great tool!
I have use cases for building matrices of TPM/counts where the
length(txId) != length(raw[[txIdCol]])
:I've created a patch that allows the user to combine input files with a subset of common features. Maybe this is of general interest? Sorry about the random edits in the diff - if you update the repository and Github I could do a proper PR.