wenweixiong / MARVEL

38 stars 9 forks source link

About the generation of splice counts file #15

Closed hkarakurt8742 closed 1 year ago

hkarakurt8742 commented 1 year ago

Hello and thank you for this amazing tool. I am trying to use it on my own data. It is a Smart-Seq2 based scRNA-Seq data and I have 1130 FASTQ files for 565 samples. I aligned them with the STAR code you provided in Plate-based tutorial. I have SJ.out.tab files for each sample in SJ folder but they do not look like in the format you provided. They all look like this and do not have a header. I am using STAR version 2.7.1a

1 14830 14969 2 2 0 0 1 31 1 135018 138007 2 4 0 4 0 45 1 146510 155766 2 2 1 0 1 39 1 155832 164262 2 2 1 0 2 35 1 168166 169048 2 2 1 0 1 16 1 169265 172556 2 2 1 0 2 48 1 172689 173752 2 2 1 0 2 18 1 185351 185490 2 2 1 0 7 39

And in the tutorial you mentioned a R code block to read to R.

` path <- "Data/SJ/" file <- "SJ.txt" sj <- as.data.frame(fread(paste(path, file, sep=""), sep="\t", header=TRUE, stringsAsFactors=FALSE, na.strings="NA"))

sj[!is.na(sj[,2]), ][1:5,1:5] `

Actually I did not fully understand how can I import splice counts to R. Should I merge all SJ.out.tab files into a single TXT file or am I missing something from the tutorial.

Thank you in advance.

wenweixiong commented 1 year ago

You're right that you'll need to merge all SJ.out.tab files into a single .txt file. Please only retrieve the 7th column as it represents the number of uniquely mapping reads crossing the junction.

hkarakurt8742 commented 1 year ago

You're right that you'll need to merge all SJ.out.tab files into a single .txt file. Please only retrieve the 7th column as it represents the number of uniquely mapping reads crossing the junction.

So the user have to make SJ counts files into the required format as:

coord.intron ERR1562083 ERR1562084 ERR1562085 ERR1562086 chr1:100007082:100022621 0 NA NA NA chr1:100007091:100024554 0 NA NA NA chr1:100007157:100011364 12 3 5 9 chr1:100007601:100023781 1 NA NA NA chr1:100011534:100015301 15 NA 8 12

Thank you for your help.