sdparekh / zUMIs

zUMIs: A fast and flexible pipeline to process RNA sequencing data with UMIs
GNU General Public License v3.0
275 stars 67 forks source link

Understanding zUMI Output #213

Closed kerimsecener closed 4 years ago

kerimsecener commented 4 years ago

Hey Chris,

I have been running zUMI with default parameters on InDrop datasets and have noticed differences in counts for several genes when mapping & counting was done with zUMI and the same with STARsolo. Briefly, some genes present higher counts in zUMI compared to STARsolo and vice-versa. Upon inspection of the reads with a browser, we can see tonnes of reads present for the genes in question, the resulting counts in the output are extremely low (inspite of a sufficient number of reads)

While, I have read several issues related to this in order to understand how the mapping and counting works, I still can't get my head to wrap around this method.

As an example, I have attached the yaml config file for one of the libraries along with the IGV visualization for one of the gene and the final inex counts (nCounts = 28) for the same gene

Screenshot 2020-09-15 at 15 25 11

yaml file: 080_S2.txt

Any help would be appreciated.

Thanks !

Kerim

cziegenhain commented 4 years ago

Hi Kerim,

So there are a couple of hints I can give here for you to check:

In general I would recommend to check in the bam file around this example locus, there should be a status tag (eg. for exon "ES" and you can see the reason for not being counted).

Best, Christoph

kerimsecener commented 4 years ago

Hi Christoph,

Thanks for your suggestions.

Concerning the second point, do you mean to set the strand = 1 or 2 in the config file?

Thanks, Kerim

cziegenhain commented 4 years ago

Exactly! The values are as follows: 0 = unstranded, 1 = positively stranded, 2 = negatively stranded

kerimsecener commented 4 years ago

I have tried to run the same library with both strand = 1 and 2, but looks like this was not the issue here. Also, could not find the ES tag in the bam file.

Do you have another suggestion ?

cziegenhain commented 4 years ago

The ES tag should definitely be there and is the most informative thing to look at in your case. Make sure you are opening the correct bam file, it should say "GeneTagged" in the filename. (Since you haven't mentioned a specific zUMIs version I'm just going to assume that this is newer than 2.6.0 where tag names have changed a bit)

kerimsecener commented 4 years ago

So, I am actually using version 2.0.6 because I had started with that back in the day with my first libraries. Maybe I could install the newest version and try running with that to compare ?

cziegenhain commented 4 years ago

absolutely!

cziegenhain commented 4 years ago

Feel free to reopen if you need additional help.

kerimsecener commented 4 years ago

Hi Chris,

So I've finally managed to run the same file with the latest zUMI version. The results are quite similar, although now I have the ES tags in the BAM files. Indeed, the reads that mapped and counted before in starSOLO and not in zUMIs contain ES:Z:Unassigned_NoFeatures; so I guess this is the reason why they are not counted. Is it possible to play with a parameter in the yaml file to ensure these reads are also taken into account ?

Thanks,

Kerim

cziegenhain commented 4 years ago

I could imagine that you either gave a different annotation or that there is an issue with the strandedness setting.

kerimsecener commented 4 years ago

By different annotation you mean the GTF file ?

cziegenhain commented 4 years ago

yes!