wenweixiong / MARVEL

42 stars 9 forks source link

Clarification and questions #1

Closed cnk113 closed 1 year ago

cnk113 commented 2 years ago

Hello,

I read your preprint and looked at the walkthrough which looks very thorough. I was wondering how the SJ/splicing is integrated from different single cell experiments and samples? Is there some sort of harmonization of the SJs? Also is the initial normalization step necessary for the gene counts? Most single cell normalization is done after QC filtering etc. It would be nice if that would be optional.

Thanks, Chang

wenweixiong commented 2 years ago

Hi Chang,

Thank you for reaching out.

There will be no need to perform any harmonisation (i.e., batch correction etc) on the splice junction (SJ) counts. This is because, in plate-based methods, the percent spliced-in (PSI) value of a given alternative exon in a given cell represents the % of SJ reads supporting the alternative exon relative to the % of SJ reads supporting or skipping the alternative exon. In droplet-based methods, the PSI value of a given SJ in a given cell population represents the total number of SJ reads over the total number of reads mapping to the corresponding gene. Therefore, because the PSI values are percentages, they will be quite robust against experimental/batch effects.

Gene normalisation would need to be performed by the users prior to MARVEL. In plate-based method, MARVEL will only assist in log2-transformation of the user-provided normalised expression values (please see "Gene expression matrix" and "Transform expression values" sections of the plate-based tutorial). You may skip this step if you provided log2-transformed values. Similarly, for droplet-based method, users would need to provide the normalised expression values to MARVEL (please see "Normalised gene expression" section of the droplet-based tutorial). Log2-transformation will be performed by MARVEL prior to differential gene expression analysis or expression visualisation using CompareValues.Genes.10x and PlotValues.PCA.Gene.10x functions, respectively. To switch off log2-transformation, simply specify log2.transform=FALSE.

Hope this helps!

Sean

ghost commented 1 year ago

Hello,

Thank you for developing the MARVEL package. I have some questions on preparing the SJ count matrices too.

I am following the droplet-based tutorial (https://wenweixiong.github.io/MARVEL_Droplet.html) and preprocessing some 10x Genomics reads.

After performing the cellranger and STARsolo, I am not too sure on how to collate and combine the STARsolo count files for the downstream analysis. It would be nice if you could share some example codes for collating the count matrices correctly.

Thanks a lot!

wenweixiong commented 1 year ago

Hi,

Example scripts and data to tabulate STARsolo splice junctions on https://github.com/wimm-hscb-lab-published/Wen_NucleicAcidsRes_2023

Note that if there are too many splice junctions (i.e., file sizes are huge), you may want to filter in splice junctions (rows) expressed in, for example 1% of cells (columns), prior to tabulating the junctions across all samples/libraries.

Sean


From: marcorainbow @.> Sent: 06 March 2023 08:29 To: wenweixiong/MARVEL @.> Cc: Wen, Sean @.>; Comment @.> Subject: Re: [wenweixiong/MARVEL] Clarification and questions (Issue #1)

Hello,

Thank you for developing the MARVEL package. I am following the droplet-based tutorial (https://wenweixiong.github.io/MARVEL_Droplet.htmlhttps://wenweixiong.github.io/MARVEL_Droplet.html) and preprocessing some 10x Genomics reads.

After performing the cellranger and STARsolo, I am not too sure on how to collate and combine the STARsolo count files for the downstream analysis. It would be nice if you could share some example codes for collating the count matrices correctly.

Thanks a lot!

— Reply to this email directly, view it on GitHubhttps://github.com/wenweixiong/MARVEL/issues/1#issuecomment-1455697446, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKZEX3KVKKGIWUAZ2PTK3NLW2WN6BANCNFSM577SFE3Q. You are receiving this because you commented.Message ID: @.***>


AstraZeneca UK Limited is a company incorporated in England and Wales with registered number:03674842 and its registered office at 1 Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA.

This e-mail and its attachments are intended for the above named recipient only and may contain confidential and privileged information. If they have come to you in error, you must not copy or show them to anyone; instead, please reply to this e-mail, highlighting the error to the sender and then immediately delete the message. For information about how AstraZeneca UK Limited and its affiliates may process information, personal data and monitor communications, please see our privacy notice at www.astrazeneca.comhttps://www.astrazeneca.com