Question on combining splice junction counts

wenweixiong / MARVEL

42 stars 9 forks source link

Question on combining splice junction counts #25

Closed randystyle21 closed 1 year ago

randystyle21 commented 1 year ago

Dr. Wen, thank you so much for developing this great tool, we really do appreciate!!

I was doing following along with your tutorial, and I have a question on how to create the merged data object for splice junction count. So practically, for normalised gene expression / gene counts, all data that is needed for MARVEL can be extracted. But for splicing junction (SJ), I tried to find a way to merge the splice junction count files from STARsolo as you mentioned but I couldn't find any reference. Would you kindly let us know how? (or where to look at the vignette?)

Thanks a ton in advance!!

Randy

wenweixiong commented 1 year ago

Hi Randy,

The example script below (script_01_tabulate_starsolo_sj_counts.R) should be helpful for merging two or more STARsolo outputs (STARsolo_example.zip) to generate the merged files (STARsolo_example_merged.zip) required by MARVEL.

The script simply (1) appends the library ID to each cell barcode to distinguish cells between different libraries, (2) creates a master list of splice junctions identified across all libraries, and finally (3) merge the read counts across all libraries.

Important to make sure that cell IDs match with your gene expression data, metadata etc. prior to analysis in MARVEL.

script_01_tabulate_starsolo_sj_counts.R.zip

STARsolo_example.zip

STARsolo_example_merged.zip

Sean

RENXI-NUS commented 10 months ago

Hi Sean,

I am trying to use your code to merge two STARsolo outputs. However, I got memory issue. My server has a total of 1.5 Tb memory.

`[1] "Matrix 1 done"

Error: cannot allocate vector of size 50603.2 Gb`

I have a total of 16 STARsolo outputs to be merged. Any suggestions would be much appreciated.

Thank you.

Ren

wenweixiong commented 10 months ago

Admittedly, combining 16 STARsolo files would require some tweaking to make the process more computationally efficient. Please reach out to me (sean.wen@astrazeneca.com) if you would like to work collaboratively on this.