thelovelab / tximport

Transcript quantification import for modular pipelines
134 stars 33 forks source link

Add kallisto HDF5 import #18

Closed andrewparkermorgan closed 6 years ago

andrewparkermorgan commented 7 years ago

The HDF5 file produced by kallisto is much smaller than the plain-text file even in the absence of bootstraps. For projects with many samples it would be advantageous to keep just the HDF5 file around. To that end I added a function that uses the Bioconductor package rhdf5 to read kallisto results from the HDF5 file. There is a modest speedup in tximport() and this could smooth the way to supporting bootstraps in the future if that is of interest.

To minimize the number of dependencies for tximport, the rhdf5 package is only added in Suggests, and import dies without fallback if the package is not present.

mikelove commented 7 years ago

hi Andrew,

That's so interesting. I just today got an email from another tximport user with another implementation of HDF5 imports.

This is not the active branch of tximport (which is the Bioconductor devel branch), but I'll work on porting this code over to the Bioconductor branch.

(Unfortunately, Bioconductor is not yet fully integrated with git, and uses svn as the primary version control.)

mikelove commented 7 years ago

@andrewparkermorgan @DarwinAwardWinner you both sent me some HDF5 supporting code this week, and i'm working on incorporating it into the Bioconductor devel branch. right now i'm adding h5 files to the tximportData so that any new code will be covered by unit tests