sdparekh / zUMIs

zUMIs: A fast and flexible pipeline to process RNA sequencing data with UMIs
GNU General Public License v3.0
271 stars 67 forks source link

Bugfix: corrects for errors in counting with special characters #266

Open TomKellyGenetics opened 3 years ago

TomKellyGenetics commented 3 years ago

Tested with published SmartSeq3 data from Array Express. Compatible with latest versions for STAR (2.7.9a) and samtools (1.7). Samtools idxstats gives "*" special character as the column name for chromosome for unmapped reads, this fails silently as it is not permitted as a factor level and leads to issues counting reads/UMIs later on. These are removed and unmapped reads are not counted. This restores counting of UMI and internal reads for SmartSeq3.

FYI: the docker container for v2.9.2 is also out of date. It requires installing R >= 4.0, samtools, and STAR, as well as many missing dependencies.

cziegenhain commented 3 years ago

Hi, Sorry to say i haven't had time to look into this. I'm not aware of any issue with idxstats so I'll need to dive into that before merging the changes. If you have a more detailed description on what you found there it'd be appreciated.

As for the docker, it shouldn't require any dependencies/installation, the conda environment zUMIs brings from GitHub should work great within docker. But thanks for the code snippet to make the docker programmatically, that's really useful!

Best Christoph

TomKellyGenetics commented 3 years ago

Sorry our server got shutdown for maintenance (while I was on leave) so I can't access the docker container where I tested this version. I'm trying to get it back up to test it again but it may take a while.

I wasn't meaning to add the Dockerfile to the PR but happy to share it. Basically these are the steps from the command history I had to run to get it to work the first time.

As a correction: I've checked and the Docker build installs samtools 1.7