vastgroup / vast-tools

A toolset for profiling alternative splicing events in RNA-Seq data.
MIT License
76 stars 28 forks source link

High reads that failed to align percentage #120

Open sav0016 opened 1 week ago

sav0016 commented 1 week ago

Hello,

thank you for your awesome tool. I am trying to run it ony my samples from RNA-Seq (150bp PE). I have reads already trimmed adapters with fastp. I am curious if "reads that failed to align" percentage is normally as high as in my example. And also I would like to ask if you have some additional recommendations for 150 PE reads (I have 10 samples - 5 vs 5 groups, I believe coverage is enough. Thank you very much! Jakub

This is my command:

vast-tools align P1_pre_1.fq.gz P1_pre_2.fq.gz \
    --sp hg38 \
    --name P1_pre \
    --cores 32 \
    --expr \
    --useFastq

This is my log:

[vast align]: VAST-TOOLS v2.5.1
[vast align]: Species assembly: hg38, VASTDB Species key: Hs2
[vast align]: VASTDB Version: vastdb.hs2.23.06.20
[vast align]: Input RNA-seq file(s): /data3/sav0016/Anicka-splicing/ALL/P1_pre_1.fq.gz and /data3/sav0016/Anicka-splicing/ALL/P1_pre_2.fq.gz
[vast align]: Most common read lengths detected for fq1 & fq2: 150 (93.58%) and 150 (93.74%)
[vast align]: Sample name: P1_pre_
[vast align]: Using VASTDB -> /data3/sav0016/vast-tools/VASTDB/Hs2
[vast align]: Setting output directory to /data3/sav0016/Anicka-splicing/ALL/vast_out
[vast align]: Setting tmp directory..
[vast align]: Set tmp directory to /data3/sav0016/Anicka-splicing/ALL/vast_out/tmp!
Setting the index via positional argument will be deprecated in a future release. Please use -x option instead.
# reads processed: 500000
# reads with at least one alignment: 257275 (51.45%)
# reads that failed to align: 242725 (48.55%)
# reads with alignments suppressed due to -m: 9232 (1.85%)
Reported 248043 alignments
[vast align]:    fraction of first reads mapping to fwd / rev strand : 0.0181 / 0.9819
Setting the index via positional argument will be deprecated in a future release. Please use -x option instead.
# reads processed: 500000
# reads with at least one alignment: 259254 (51.85%)
# reads that failed to align: 240746 (48.15%)
# reads with alignments suppressed due to -m: 8771 (1.75%)
Reported 250483 alignments
[vast align]:    fraction of second reads mapping to fwd / rev strand : 0.9836 / 0.0164
[vast align]:    reverse-complementing reads from /data3/sav0016/Anicka-splicing/ALL/P1_pre_1.fq.gz; writing into /data3/sav0016/Anicka-splicing/ALL/vast_out/tmp/tmpfqs_2IeYDpXy/P1_pre_1.fq.gz
[vast align]: Mapping RNAseq reads against mRNA sequences
[vast align]: Calculating cRPKMs
Setting the index via positional argument will be deprecated in a future release. Please use -x option instead.
[vast trim]: Total processed reads: 62533102
[vast trim]: Total valid fwd reads: 62376087
# reads processed: 62376087
# reads with at least one alignment: 35369133 (56.70%)
# reads that failed to align: 27006954 (43.30%)
# reads with alignments suppressed due to -m: 4315317 (6.92%)
Reported 31053816 alignments
[vast align]: Trimming RNAseq reads to 50 nt sequences
[vast trim]: Total processed reads: 62533102
[vast trim]: Total valid fwd reads: 62376087
[vast trim]: Total valid rev reads: 62365474
[vast align]: Doing genome subtraction
Setting the index via positional argument will be deprecated in a future release. Please use -x option instead.
# reads processed: 611280096
# reads with at least one alignment: 563553098 (92.19%)
# reads that failed to align: 47726998 (7.81%)
# reads with alignments suppressed due to -m: 153276138 (25.07%)
Reported 410276960 alignments
[vast align]: Mapping reads to the "splice site-based" (aka "a posteriori") EEJ library and Analyzing...
Setting the index via positional argument will be deprecated in a future release. Please use -x option instead.
# reads processed: 47726998
# reads with at least one alignment: 26322125 (55.15%)
# reads that failed to align: 21404873 (44.85%)
# reads with alignments suppressed due to -m: 1747773 (3.66%)
Reported 24574352 alignments
[vast align]: Mapping reads to the "transcript-based" (aka "a priori") SIMPLE EEJ library and Analyzing...
Setting the index via positional argument will be deprecated in a future release. Please use -x option instead.
# reads processed: 47726998
# reads with at least one alignment: 4066466 (8.52%)
# reads that failed to align: 43660532 (91.48%)
# reads with alignments suppressed due to -m: 88041 (0.18%)
Reported 3978425 alignments
[vast align]: Mapping reads to the "transcript-based" (aka "a priori") MULTI EEJ library and Analyzing...
Setting the index via positional argument will be deprecated in a future release. Please use -x option instead.
# reads processed: 47726998
# reads with at least one alignment: 11352168 (23.79%)
# reads that failed to align: 36374830 (76.21%)
# reads with alignments suppressed due to -m: 567964 (1.19%)
Reported 10784204 alignments
[vast align]: Mapping reads to microexon EEJ library and Analyzing...
Setting the index via positional argument will be deprecated in a future release. Please use -x option instead.
# reads processed: 47726998
# reads with at least one alignment: 490885 (1.03%)
# reads that failed to align: 47236113 (98.97%)
# reads with alignments suppressed due to -m: 63755 (0.13%)
Reported 427130 alignments
[vast align]: Mapping reads to intron retention library (version 2)...
Setting the index via positional argument will be deprecated in a future release. Please use -x option instead.
# reads processed: 611280096
# reads with at least one alignment: 37477067 (6.13%)
# reads that failed to align: 573803029 (93.87%)
# reads with alignments suppressed due to -m: 1752098 (0.29%)
Reported 35724969 alignments
Setting the index via positional argument will be deprecated in a future release. Please use -x option instead.
# reads processed: 611280096
# reads with at least one alignment: 19000811 (3.11%)
# reads that failed to align: 592279285 (96.89%)
# reads with alignments suppressed due to -m: 8756876 (1.43%)
Reported 10243935 alignments
[vast align]: Cleaning P1_pre-50.fq.gz files!
[vast align]: Cleaning up P1_pre-50-e.fa.gz!
[vast align]: Deleting temporary files with reverse-complemented reads.
[vast align]: Completed Wed Jun 26 15:27:13 2024
mirimia commented 1 week ago

Hello,

Thanks!

Yes, it seems fine. There seem to be a relatively high fraction of intronic reads, perhaps (ie. Ok if ribo-depleted).

For 5 vs 5 you could perhaps use a wilcoxon test + min average delta PSI after ensuring coverage in 3/5 of samples in each group (using tidy, for instance).