vastgroup / vast-tools

A toolset for profiling alternative splicing events in RNA-Seq data.
MIT License
77 stars 28 forks source link

PE-150 RNA-Seq data: Trimming RNAseq reads to 50 nt sequences #99

Closed huangwb8 closed 3 years ago

huangwb8 commented 3 years ago

Hi~

vast-tools works well without error information, although the speed is a little bit slow.

The alignment percentage problem had been discussed in issue#88. Here, I notice a comment: [vast align]: Trimming RNAseq reads to 50 nt sequences.

My RNA-Seq data is paired-end 150, which means that the read lengh is 150nt.

I wondering why vast-tools trim it into 50nt. Is it OK for the downstream analysis? I haven't go to the step like merge, combine or compare .

My code of alignment is like:

${vast_tools} align ${fq1} ${fq2} \
        --name ${case} \
        --sp hg38 --expr --cores 5 \
        --dbDir ${vastDB} \
        --output ${path_vast_res} \
        > ${path_vast_log}/vast_align_${case}.log

The log is as following:

[vast align]: VAST-TOOLS v2.5.1
[vast align]: Species assembly: hg38, VASTDB Species key: Hs2
[vast align]: VASTDB Version: vastdb.hs2.23.06.20
[vast align]: Input RNA-seq file(s): ~/Project/KO/output/fastq/cleaned/D1.R1.fq.gz and ~/Project/KO/output/fastq/cleaned/D1.R2.fq.gz
[vast align]: Most common read lengths detected for fq1 & fq2: 150 (82.58%) and 150 (83.16%)
[vast align]: Sample name: D1 
[vast align]: Using VASTDB -> /data/VASTDB/Hs2
[vast align]: Setting output directory to ~/Project/KO/output/vast-tools
[vast align]: Setting tmp directory..
[vast align]: Set tmp directory to ~/Project/KO/output/vast-tools/tmp!
Setting the index via positional argument will be deprecated in a future release. Please use -x option instead.
# reads processed: 500000
# reads with at least one alignment: 344083 (68.82%)
# reads that failed to align: 155917 (31.18%)
# reads with alignments suppressed due to -m: 8099 (1.62%)
Reported 335984 alignments
[vast align]:    fraction of first reads mapping to fwd / rev strand : 0.9879 / 0.0121
Setting the index via positional argument will be deprecated in a future release. Please use -x option instead.
# reads processed: 500000
# reads with at least one alignment: 323466 (64.69%)
# reads that failed to align: 176534 (35.31%)
# reads with alignments suppressed due to -m: 7081 (1.42%)
Reported 316385 alignments
[vast align]:    fraction of second reads mapping to fwd / rev strand : 0.0127 / 0.9873
[vast align]:    reverse-complementing reads from ~/Project/KO/output/fastq/cleaned/D1.R2.fq.gz; writing into ~/Project/KO/output/vast-tools/tmp/tmpfqs_71Zd7ViC/D1.R2.fq.gz
[vast align]: Mapping RNAseq reads against mRNA sequences
[vast align]: Calculating cRPKMs
Setting the index via positional argument will be deprecated in a future release. Please use -x option instead.
[vast trim]: Total processed reads: 40639448
[vast trim]: Total valid fwd reads: 40359005
# reads processed: 40359005
# reads with at least one alignment: 31362446 (77.71%)
# reads that failed to align: 8996559 (22.29%)
# reads with alignments suppressed due to -m: 1273276 (3.15%)
Reported 30089170 alignments
[vast align]: Trimming RNAseq reads to 50 nt sequences
[vast trim]: Total processed reads: 40639448
[vast trim]: Total valid fwd reads: 40359005
[vast trim]: Total valid rev reads: 40355960
[vast align]: Doing genome subtraction
Setting the index via positional argument will be deprecated in a future release. Please use -x option instead.
# reads processed: 379272142
# reads with at least one alignment: 312676102 (82.44%)
# reads that failed to align: 66596040 (17.56%)
# reads with alignments suppressed due to -m: 92539231 (24.40%)
Reported 220136871 alignments
[vast align]: Mapping reads to the "splice site-based" (aka "a posteriori") EEJ library and Analyzing...
Setting the index via positional argument will be deprecated in a future release. Please use -x option instead.
# reads processed: 66596040
# reads with at least one alignment: 43306464 (65.03%)
# reads that failed to align: 23289576 (34.97%)
# reads with alignments suppressed due to -m: 2936532 (4.41%)
Reported 40369932 alignments
[vast align]: Mapping reads to the "transcript-based" (aka "a priori") SIMPLE EEJ library and Analyzing...
Setting the index via positional argument will be deprecated in a future release. Please use -x option instead.
# reads processed: 66596040
# reads with at least one alignment: 6426757 (9.65%)
# reads that failed to align: 60169283 (90.35%)
# reads with alignments suppressed due to -m: 155004 (0.23%)
Reported 6271753 alignments
[vast align]: Mapping reads to the "transcript-based" (aka "a priori") MULTI EEJ library and Analyzing...
Setting the index via positional argument will be deprecated in a future release. Please use -x option instead.
# reads processed: 66596040
# reads with at least one alignment: 16728253 (25.12%)
# reads that failed to align: 49867787 (74.88%)
# reads with alignments suppressed due to -m: 638221 (0.96%)
Reported 16090032 alignments
[vast align]: Mapping reads to microexon EEJ library and Analyzing...
Setting the index via positional argument will be deprecated in a future release. Please use -x option instead.
# reads processed: 66596040
# reads with at least one alignment: 1232476 (1.85%)
# reads that failed to align: 65363564 (98.15%)
# reads with alignments suppressed due to -m: 157427 (0.24%)
Reported 1075049 alignments
[vast align]: Mapping reads to intron retention library (version 2)...
Setting the index via positional argument will be deprecated in a future release. Please use -x option instead.
# reads processed: 379272142
# reads with at least one alignment: 58313802 (15.38%)
# reads that failed to align: 320958340 (84.62%)
# reads with alignments suppressed due to -m: 1674552 (0.44%)
Reported 56639250 alignments
Setting the index via positional argument will be deprecated in a future release. Please use -x option instead.
# reads processed: 379272142
# reads with at least one alignment: 4602014 (1.21%)
# reads that failed to align: 374670128 (98.79%)
# reads with alignments suppressed due to -m: 2075807 (0.55%)
Reported 2526207 alignments
[vast align]: Cleaning D1-50.fa.gz files!
[vast align]: Cleaning up D1-50-e.fa.gz!
[vast align]: Deleting temporary files with reverse-complemented reads.
[vast align]: Completed Tue Aug 17 18:21:22 2021
huangwb8 commented 3 years ago

It had been explained in issue 45. I think it is the sample problem as mine. Thanks!