splicebox / PsiCLASS

Simultaneous multi-sample transcript assembler for RNA-seq data
16 stars 4 forks source link

Is XS tag indispensable for strand-specific RNA-seq library? #25

Open Pentayouth opened 2 years ago

Pentayouth commented 2 years ago

Dear author,

I would like to compare the assembly of both psiclass and stringtie.

In stringtie, the user could specify --rf or --fr for strand-specific RNA-seq library instead of output XS tag in the STAR alignment step. So I didn't use --outSAMstrandField intronMotif and thus my bam files do not have XS tag.

I wonder if such bams would influence the psiclass assembly? Or if adding XS tag to bam outputs is indispensable regardless of the strandness of the experiment? Is there any workaround instead of performing the time consuming STAR alignment steps (I have hundreds of samples)?

Best regards, Wang

mourisl commented 2 years ago

We have provided the program "addXS" in the package. It adds the "XS" field by checking the donor/acceptor motifs. The command is "samtools view -h in.bam | ./addXS reference_genome.fa | samtools view -bS - > out.bam"

But this still takes a while to generate, because it needs to decompress and compress the BAM file. I can add the strand-specific feature, and it should not take long.

mourisl commented 2 years ago

Just want to confirm, are the samples under the same strand library? Or it is a mixture? Thank you.

Pentayouth commented 2 years ago

All the samples are the same strand library.

mourisl commented 2 years ago

Thanks for the information. I've added the option --stranded to psiclass in the git branch "stranded". Could you please checkout the branch and test whether PsiCLASS generates reasonable results? If so, I will merge this updates to the master branch. You can specify the strand library through the option like "--stranded rf" or "--stranded fr". Thank you!

Pentayouth commented 2 years ago

The branch didn't work properly, I ran: /public/home/lijing/wangzw/resource/bins/psiclass/psiclass --lb bam.list -p 1 --stranded rf which threw an error

$ /public/home/wang/resource/bins/psiclass/psiclass --lb bam.list -p 1 --stranded rf
sh: /public/home/wang/resource/bins/psiclass/samtools-0.1.19/samtools: No such file or directory
Found mate read id index suffix(.1 or /1). Calling "--mateIdx 1" option. If this is a false calling, please use "--mateIdx 0".
/public/home/wang/resource/bins/psiclass/junc /public/home/wang/subject/star_new/N1/N1.2pass.Aligned.sortedByCoord.out.bam -a  --stranded rf --hasMateIdSuffix > ./splice/psiclass_bam_0.raw_splice
sh: /public/home/wang/resource/bins/psiclass/junc: No such file or directory
Terminated
mourisl commented 2 years ago

It seems the program junc and samtools are not compiled. Could you please run "make" to generate those executables? Thank you.

Pentayouth commented 2 years ago

I'm sorry for forgetting the make step. Now psiclass is working properly. To my experience the whole process would take 3-4 days on using 23 threads and I will give you feedbacks then. I really appreciate your continuous support of the program.

Pentayouth commented 2 years ago

I run

/public/home/lijing/wangzw/resource/bins/psiclass/psiclass \
--lb bam.list \
-p 24 \
--stranded rf

the gffcompare result of the psiclass_vote.gtf gave weird results, showing low specificity even at intron level, which is abnormal. 图片 below is stringtie merge 图片

I checked igv and found the software gave assemblies at the opposite strand. 图片 图片 图片

Pentayouth commented 2 years ago

btw, I checked the library strandness using RSeQC

samtools view -Sbh N1_WTS.bam chr22 > chr22.old.bam
infer_experiment.py \
-r gencode.v24.chr_patch_hapl_scaff.annotation.12.bed \
-i chr22.old.bam

the result is:

# This is PairEnd Data
# Fraction of reads failed to determine: 0.0515
# Fraction of reads explained by "1++,1--,2+-,2-+": 0.0318
# Fraction of reads explained by "1+-,1-+,2++,2--": 0.9167

according to this figure (from RSeQC) 图片

my library is 1+-,1-+,2++,2--, which means my library is fr-firststrand (aka RF) 图片

so I used --rf in stringtie and --stranded rf in psiclass

mourisl commented 2 years ago

Thank you for showing the details. It seems some of the introns are on the right strand while most of them are not. In my test data (even not a stranded library), the strand is the same between PsiCLASS and stringtie. I'll look into this issue by creating a better debugging example. If the chr22.bam file is small, I would appreciate it if you can share the file with me. Thank you!

Pentayouth commented 2 years ago

Ok, I would like to share 3 normal bams covering chr22 with you, all are stranded library. Would you please provide an email address?

mourisl commented 2 years ago

Yes, you can use the email lsong@ds.dfci.harvard.edu . One bam file would be sufficient. Thank you!

Pentayouth commented 2 years ago

Please check the email for the download link.

mourisl commented 2 years ago

Thank you for providing the test examples! I think I've fixed this issue. Could you pull the new branch, recompile PsiCLASS and give it a try? Thank you for your patience and help.

Pentayouth commented 2 years ago

Thank you. I will give you feedbacks.

Pentayouth commented 2 years ago

The gffcomapre result looks plausible this time. Thank you very much. 图片

mourisl commented 2 years ago

Thank you! I will merge this branch to master and release a new version.