wheaton5 / souporcell

Clustering scRNAseq by genotypes
MIT License
168 stars 46 forks source link

Loaded 0 counts and error line in doublets.err file #166

Open mpunta opened 1 year ago

mpunta commented 1 year ago

Hi,

Great tool, thank you for making it available to the community.

I have run souporcell on a number of scRNA-seq samples. Everything appears to go fine and the program runs to completion providing results that make complete sense. However, in the doublets.err files of all samples I find that the first line is:

xxxx loaded 0 counts, is this a problem?

where xxx is a number that ranges from few thousands to several thousands

and, additionally, in the same files there are a few lines (many in a couple of samples with high predicted ambient RNA) that read like:

error AAACGGGTCAGGTAAA-1 1 0 error AAAGCAACATGTAGTC-1 0 1

I tried to look up the code but it is still not very clear to me what these messages refer to. In particular, I would like to know if this is something I should be worried about or, rather, these are just internal messages that all other things being fine can be safely ignored.

Could you clarify this for me please? Thank you! Marco

wheaton5 commented 1 year ago

Thanks for the kind words.

For xxx loaded 0 counts, is this a problem? These are the number of loci where both clusters in a potential doublet combination have no alleles. So I don't have an allele fraction for that cluster combination for that variant. In this case I default to 0.5. I think this is normal, I just wasn't sure what the ramifications were for this.

The second is when the doublet analysis comes up with a different cluster assignment than the cluster assignment. I guess I never had this in my tests or I forgot it was possible and wasnt looking at the error files always. In this case, as the doublet analysis is essentially changing the clusters by removing doublets and recomputing which is the most likely cluster, we go with the assignment from the doublet analysis. But it is pretty strange that this can happen and you might want to just remove those cells from downstream analysis.

I hope this helps.

Best, Haynes

On Mon, Feb 6, 2023 at 9:59 AM mpunta @.***> wrote:

Hi,

Great tool, thank you for making it available to the community.

I have run souporcell on a number of scRNA-seq samples. Everything appears to go fine and the program runs to completion providing results that make complete sense. However, in the doublets.err files of all samples I find that the first line is:

xxxx loaded 0 counts, is this a problem?

where xxx is a number that ranges from few thousands to several thousands

and, additionally, in the same files there are a few lines (many in a couple of samples with high predicted ambient RNA) that read like:

error AAACGGGTCAGGTAAA-1 1 0 error AAAGCAACATGTAGTC-1 0 1

I tried to look up the code but it is still not very clear to me what these messages refer to. In particular, I would like to know if this is something I should be worried about or, rather, these are just internal messages that all other things being fine can be safely ignored.

Could you clarify this for me please? Thank you! Marco

— Reply to this email directly, view it on GitHub https://github.com/wheaton5/souporcell/issues/166, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABEWNJQZGVCPIKBSVFVWKH3WWENXVANCNFSM6AAAAAAUS3LJFI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

--

Haynes Heaton, M.D., Ph.D. Assistant Professor Auburn University, CS

mpunta commented 1 year ago

Thanks for the really quick answer Haynes. I should add that the vast majority of the cells that have am error line in the doublets.err file such as:

error AAACGGGTCAGGTAAA-1

are actually labeled as "unassigned" in the clusters.tsv file and I am not considering those in any case.

Best regards, Marco