Long read alignment analysis. Generate a reports on sequence alignments for mappability vs read sizes, error patterns, annotations and rarefraction curve analysis. The most basic analysis only requires a BAM file, and outputs a web browser compatible xhtml to visualize/share/store/extract analysis results.
Apache License 2.0
45
stars
10
forks
source link
Expected lines to be ordered but they appear not to be ordered #25
Thank you for this awesome tool. I want to try it for our PacBio, Illumina, and ONT data. However, I keep on getting the error mentioned in the subject like regardless of my attempts. Can you please help me figure it out?
I used the following script:
module load anaconda2
cd /stornext/General/data/user_managed/grpu_mritchie_1/Shani/long_read_benchmark/alignqc/
source activate alignqc
module load samtools/1.7
REFERENCE="/stornext/General/data/user_managed/grpu_mritchie_1/Shani/atac-seq/20190529_MiRCL_ATAC/references/genome.fa"
mkdir "/stornext/General/data/user_managed/grpu_mritchie_1/Shani/long_read_benchmark/alignqc_output/ont"
OUT_DIR="/stornext/General/data/user_managed/grpu_mritchie_1/Shani/long_read_benchmark/alignqc_output/ont"
# full BAM files took forever - so trying on the subsample
IN_LOC_ONT="/stornext/General/data/user_managed/grpu_mritchie_1/XueyiDong/long_read_benchmark/ONT/bam_subsample"
find ${IN_LOC_ONT} -name '*.bam' -print0 | while IFS= read -r -d '' BAM
do
OUT_P=${BAM##*/};OUT_P=${OUT_P%%.sorted*};
echo " ######## --------- processing $BAM in $OUT_P -------- #########################";
seq-tools sort --bam ${BAM} -o ${BAM}.sorted.bam;
samtools index ${BAM}.sorted.bam;
mkdir ${OUT_DIR}/${OUT_P};
echo " ######## --------- results saved in $OUT_DIR/$OUT_P -------- #########################";
alignqc analyze ${BAM}.sorted.bam -g ${REFERENCE} --no_transcriptome --threads 8 --specific_tempdir ${OUT_DIR}/${OUT_P} -o ${OUT_DIR}/${OUT_P}/${OUT_P}.ont.alignqc.xhtml
done
The error I'm getting is as follows:
######## --------- processing /stornext/Projects/promethion/promethion_access/lab_ritchie/transcr_bench_PacBio/short_term/alignqc/ont_bam_subsample/bam_subsample/barcode05.sorted.bam in barcode05 -------- #########################
######## --------- results saved in /stornext/Projects/promethion/promethion_access/lab_ritchie/transcr_bench_PacBio/short_term/alignqc/ont_bam_subsample//barcode05 -------- #########################
Using Rscript version:
R scripting front-end version 3.6.1 (2019-07-05)
WARNING: No annotation specified. Will be unable to report feature specific outputs
Creating initial alignment mapping data
/stornext/HPCScratch/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/site-packages/alignqc/bam_preprocess.py /stornext/Projects/promethion/promethion_access/lab_ritchie/transcr_bench_PacBio/short_term/alignqc/ont_bam_subsample/bam_subsample/barcode05.sorted.bam --minimum_intron_size 68 -o /stornext/Projects/promethion/promethion_access/lab_ritchie/transcr_bench_PacBio/short_term/alignqc/ont_bam_subsample//barcode05/temp/alndata.txt.gz --threads 8 --specific_tempdir /stornext/Projects/promethion/promethion_access/lab_ritchie/transcr_bench_PacBio/short_term/alignqc/ont_bam_subsample//barcode05/temp/
read basics
6257000
check for best set
6250000/6257982
combining results
6257982
Traverse bam for alignment analysis
/stornext/HPCScratch/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/site-packages/alignqc/traverse_preprocessed.py /stornext/Projects/promethion/promethion_access/lab_ritchie/transcr_bench_PacBio/short_term/alignqc/ont_bam_subsample//barcode05/temp/alndata.txt.gz -o /stornext/Projects/promethion/promethion_access/lab_ritchie/transcr_bench_PacBio/short_term/alignqc/ont_bam_subsample//barcode05/data/ --specific_tempdir /stornext/Projects/promethion/promethion_access/lab_ritchie/transcr_bench_PacBio/short_term/alignqc/ont_bam_subsample//barcode05/temp/ --threads 8 --min_aligned_bases 50 --max_query_overlap 10 --max_target_overlap 10 --max_target_gap 500000 --required_fractional_improvement 0.2
6257982 alignments 3844424 reads
Writing chromosome lengths from header
/stornext/HPCScratch/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/site-packages/alignqc/bam_to_chr_lengths.py /stornext/Projects/promethion/promethion_access/lab_ritchie/transcr_bench_PacBio/short_term/alignqc/ont_bam_subsample/bam_subsample/barcode05.sorted.bam -o /stornext/Projects/promethion/promethion_access/lab_ritchie/transcr_bench_PacBio/short_term/alignqc/ont_bam_subsample//barcode05/data/chrlens.txt
Can we find any known read types
/stornext/HPCScratch/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/site-packages/alignqc/get_platform_report.py /stornext/Projects/promethion/promethion_access/lab_ritchie/transcr_bench_PacBio/short_term/alignqc/ont_bam_subsample//barcode05/data/lengths.txt.gz /stornext/Projects/promethion/promethion_access/lab_ritchie/transcr_bench_PacBio/short_term/alignqc/ont_bam_subsample//barcode05/data/special_report
Go through genepred best alignments and make a bed depth file
Generate the depth bed for the mapped reads
gpd_to_bed_depth.py /stornext/Projects/promethion/promethion_access/lab_ritchie/transcr_bench_PacBio/short_term/alignqc/ont_bam_subsample//barcode05/data/best.sorted.gpd.gz -o /stornext/Projects/promethion/promethion_access/lab_ritchie/transcr_bench_PacBio/short_term/alignqc/ont_bam_subsample//barcode05/data/depth.sorted.bed.gz --threads 8
Traceback (most recent call last):
File "/home/amarasinghe.s/.conda/envs/alignqc/bin/alignqc", line 11, in <module>
sys.exit(entry_point())
File "/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/site-packages/alignqc/alignqc.py", line 47, in entry_point
main(args,operable_argv)
File "/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/site-packages/alignqc/alignqc.py", line 17, in main
analyze.external_cmd(operable_argv,version=version)
File "/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/site-packages/alignqc/analyze.py", line 88, in external_cmd
main(args)
File "/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/site-packages/alignqc/analyze.py", line 54, in main
prepare_all_data.external(args)
File "/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/site-packages/alignqc/prepare_all_data.py", line 844, in external
main(args)
File "/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/site-packages/alignqc/prepare_all_data.py", line 60, in main
make_data_bam(args)
File "/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/site-packages/alignqc/prepare_all_data.py", line 184, in make_data_bam
gpd_to_bed_depth(cmd)
File "/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/site-packages/seqtools/cli/utilities/gpd_to_bed_depth.py", line 60, in external_cmd
main(args)
File "/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/site-packages/seqtools/cli/utilities/gpd_to_bed_depth.py", line 27, in main
for covs in results:
File "/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/multiprocessing/pool.py", line 271, in <genexpr>
return (item for chunk in result for item in chunk)
File "/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/multiprocessing/pool.py", line 673, in next
raise value
ValueError: Expected lines to be ordered but they appear not to be ordered on line 3362988
Then I used the seq-tools sort option to get the files sorted first as you have mentioned in this issue. However, it still doesn't seem to solve the problem as seen form below email.
######## --------- processing /stornext/Projects/promethion/promethion_access/lab_ritchie/transcr_bench_PacBio/short_term/alignqc/ont_bam_subsample/bam_subsample/barcode01.sorted.bam in barcode01 -------- #########################
[bam_sort_core] merging from 0 files and 10 in-memory blocks...
######## --------- results saved in /stornext/Projects/promethion/promethion_access/lab_ritchie/transcr_bench_PacBio/short_term/alignqc/ont_bam_subsample//barcode01 -------- #########################
Using Rscript version:
R scripting front-end version 3.6.1 (2019-07-05)
WARNING: No annotation specified. Will be unable to report feature specific outputs
Creating initial alignment mapping data
/stornext/HPCScratch/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/site-packages/alignqc/bam_preprocess.py /stornext/Projects/promethion/promethion_access/lab_ritchie/transcr_bench_PacBio/short_term/alignqc/ont_bam_subsample/bam_subsample/barcode01.sorted.bam.sorted.bam --minimum_intron_size 68 -o /stornext/Projects/promethion/promethion_access/lab_ritchie/transcr_bench_PacBio/short_term/alignqc/ont_bam_subsample//barcode01/temp/alndata.txt.gz --threads 8 --specific_tempdir /stornext/Projects/promethion/promethion_access/lab_ritchie/transcr_bench_PacBio/short_term/alignqc/ont_bam_subsample//barcode01/temp/
read basics
5916000
check for best set
5910000/5916804
combining results
5916804
Traverse bam for alignment analysis
/stornext/HPCScratch/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/site-packages/alignqc/traverse_preprocessed.py /stornext/Projects/promethion/promethion_access/lab_ritchie/transcr_bench_PacBio/short_term/alignqc/ont_bam_subsample//barcode01/temp/alndata.txt.gz -o /stornext/Projects/promethion/promethion_access/lab_ritchie/transcr_bench_PacBio/short_term/alignqc/ont_bam_subsample//barcode01/data/ --specific_tempdir /stornext/Projects/promethion/promethion_access/lab_ritchie/transcr_bench_PacBio/short_term/alignqc/ont_bam_subsample//barcode01/temp/ --threads 8 --min_aligned_bases 50 --max_query_overlap 10 --max_target_overlap 10 --max_target_gap 500000 --required_fractional_improvement 0.2
5916804 alignments 3720827 reads
Writing chromosome lengths from header
/stornext/HPCScratch/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/site-packages/alignqc/bam_to_chr_lengths.py /stornext/Projects/promethion/promethion_access/lab_ritchie/transcr_bench_PacBio/short_term/alignqc/ont_bam_subsample/bam_subsample/barcode01.sorted.bam.sorted.bam -o /stornext/Projects/promethion/promethion_access/lab_ritchie/transcr_bench_PacBio/short_term/alignqc/ont_bam_subsample//barcode01/data/chrlens.txt
Can we find any known read types
/stornext/HPCScratch/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/site-packages/alignqc/get_platform_report.py /stornext/Projects/promethion/promethion_access/lab_ritchie/transcr_bench_PacBio/short_term/alignqc/ont_bam_subsample//barcode01/data/lengths.txt.gz /stornext/Projects/promethion/promethion_access/lab_ritchie/transcr_bench_PacBio/short_term/alignqc/ont_bam_subsample//barcode01/data/special_report
Go through genepred best alignments and make a bed depth file
Generate the depth bed for the mapped reads
gpd_to_bed_depth.py /stornext/Projects/promethion/promethion_access/lab_ritchie/transcr_bench_PacBio/short_term/alignqc/ont_bam_subsample//barcode01/data/best.sorted.gpd.gz -o /stornext/Projects/promethion/promethion_access/lab_ritchie/transcr_bench_PacBio/short_term/alignqc/ont_bam_subsample//barcode01/data/depth.sorted.bed.gz --threads 8
Traceback (most recent call last):
File "/home/amarasinghe.s/.conda/envs/alignqc/bin/alignqc", line 11, in <module>
sys.exit(entry_point())
File "/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/site-packages/alignqc/alignqc.py", line 47, in entry_point
main(args,operable_argv)
File "/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/site-packages/alignqc/alignqc.py", line 17, in main
analyze.external_cmd(operable_argv,version=version)
File "/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/site-packages/alignqc/analyze.py", line 88, in external_cmd
main(args)
File "/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/site-packages/alignqc/analyze.py", line 54, in main
prepare_all_data.external(args)
File "/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/site-packages/alignqc/prepare_all_data.py", line 844, in external
main(args)
File "/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/site-packages/alignqc/prepare_all_data.py", line 60, in main
make_data_bam(args)
File "/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/site-packages/alignqc/prepare_all_data.py", line 184, in make_data_bam
gpd_to_bed_depth(cmd)
File "/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/site-packages/seqtools/cli/utilities/gpd_to_bed_depth.py", line 60, in external_cmd
main(args)
File "/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/site-packages/seqtools/cli/utilities/gpd_to_bed_depth.py", line 27, in main
for covs in results:
File "/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/multiprocessing/pool.py", line 271, in <genexpr>
return (item for chunk in result for item in chunk)
File "/home/amarasinghe.s/.conda/envs/alignqc/lib/python2.7/multiprocessing/pool.py", line 673, in next
raise value
ValueError: Expected lines to be ordered but they appear not to be ordered on line 3364652
Hi @jason-weirather ,
Thank you for this awesome tool. I want to try it for our PacBio, Illumina, and ONT data. However, I keep on getting the error mentioned in the subject like regardless of my attempts. Can you please help me figure it out?
I used the following script:
The error I'm getting is as follows:
Then I used the
seq-tools sort
option to get the files sorted first as you have mentioned in this issue. However, it still doesn't seem to solve the problem as seen form below email.I'm attaching the genome file and a small sample of the BAM file here: barcode05.sorted.bam.first_10_lines.bam.gz
The header of this
.bam
file is as follows:Also, I'm attaching the full bam file and a zip file of whatever I got as an output from running the script here: https://drive.google.com/drive/folders/1HtuIZWOSCh-7PpmxLJyZNy37b8Uo9N6z?usp=sharing
Your help would be really appreciated to figure out what is going on...
Many thanks, Shani