Open agropyron opened 6 years ago
I'm running into the same issue with my own analysis right now! I'm hoping to compare just a subset of genes across several treatments, and have used subset_kallisto() to produce kallisto objects that include only my genes of interest. As @agropyron found, however, write_kallisto_hdf5() does NOT capture the "est_counts" information from the original kallisto object. (The hdf5 produced by write_kallisto_hdf5() lacks information in the "est_counts" dataset.)
I would love to avoid having to do all this subsetting manually. Any suggestions would be very welcome!
Hi @agropyron and @laurabogar,
I think I found the offending line: https://github.com/pachterlab/sleuth/blob/24ca2859f13d1eac588d6dd2ac90bf860b4fe07e/R/kallisto.R#L121 This should be the following:
new_obj$abundance <- new_obj$abundance[which(new_obj$abundance$target_id %in% target_ids), ]
I can make a quick patch, but it's not clear when we will be able to get a new stable release out. I would encourage you to make the change yourself, and work with a modified sleuth package. The best way to do it is to clone the repo, make the change, then pull up R and then do the following line:
git clone https://github.com/pachterlab/sleuth.git
## make the change to sleuth/R/kallisto.R using whatever text editor you use
## for example: `emacs sleuth/R/kallisto.R`
## then load up R, and install the package in a separate location from your main library
## my favorite testing library directory is "~/test_R_lib", but it can be any directory
## just make sure the directory exists before running this line of code
R-session:> install.packages("sleuth/", repos=NULL, type="source", lib = "/path/to/alt/library")
Then, you would install the modified sleuth like this:
library(sleuth, lib.loc = "/path/to/alt/library")
It should work then. If you continue to have problems (or if you have problems with doing the above instructions), please report back! Thanks!
Hi, @warrenmcg's advice in Issue #201 to subset files (below) does not work for me
For importing I used _readkallisto - I think _readbootstrap is a typo. Worked. _subsetkallisto(kal, target_ids=ids) results in a file with the correct number of rows (616) in kal$bootstrap but with zero rows in kal$abundance. I proceeded through_write_kallistohdf5 and, for a sanity check, reimported the subsetted files using _readkallisto, which fails (understandably) with the error below.
When I use a custom function to subset the files, providing additional attributes (original_num_targets, num_targets, excluded_ids) in the same way _subsetkallisto does, I get the correct number of rows in kal$abundance and kal$bootstrap. However, I can't cheat sleuth into accepting the new attributes: _is_kallistosubset does not report the file as subsetted.
Responsible is either _write_kallistohdf5 or _read_kallistoh5. I think part of the problem lies with the below lines in _read_kallistoh5, but I can't figure out why the excluded_ids are not shown when calling summary(kal)