Open twaddlac opened 1 year ago
Hi! could you share the bamfile / part of the bamfile that leads to this? Or paste in the bam header output from samtools when used on that bamfile as well.
I'm experiencing the same issue with a few different references I'm trying to get perbase to work with:
REF POS DEPTH A C G T N INS DEL REF_SKIP FAIL NEAR_MAX_DEPTH
^@^@^@^@^@^@^@^P<9B>^C<F0><FF>^?^@^@! 9 26 0 0 22 0 4 0 0 0 0 false
^@^@^@^@^@^@^@<E0>}^A<F0><FF>^?^@^@! 10 98 0 0 0 83 15 0 0 0 0 false
^@^@^@^@^@^@^@^P~^A<F0><FF>^?^@^@! 11 154 0 123 0 0 31 0 0 0 0 false
^@^@^@^@^@^@^@p<E1>^@<F0><FF>^?^@^@! 12 161 0 0 0 123 38 0 0 0 0 false
^@^@^@^@^@^@^@<80><CC>^@<F0><FF>^?^@^@! 13 167 134 0 0 0 33 0 0 0 0 false
^@^@^@^@^@^@^@`%^E<F0><FF>^?^@^@! 14 675 0 546 0 1 128 0 0 0 0 false
^@^@^@^@^@^@^@^@<B9>^C<F0><FF>^?^@^@! 15 716 0 0 581 0 135 0 0 0 0 false
^@^@^@^@^@^@^@<E0>:^E<F0><FF>^?^@^@! 16 782 0 0 1 588 193 0 0 0 0 false
^@^@^@^@^@^@^@^P;^E<F0><FF>^?^@^@! 17 870 0 1 556 0 313 0 0 0 0 false
^@^@^@^@^@^@^@^P^U^L<F0><FF>^?^@^@! 18 940 0 0 710 0 230 0 0 0 0 false
^@^@^@^@^@^@^@^P^F^L<F0><FF>^?^@^@! 19 973 832 0 0 0 141 0 0 0 0 false
Here is my bam file header:
@HD VN:1.6 SO:coordinate
@SQ SN:DENV3-E-W_Min_preMVS_P3_CDX430_v2 LN:10708
@PG ID:bwa PN:bwa VN:0.7.17-r1188 CL:bwa mem -t 96 /references/E3093-3_S3_ref.fasta /bbmerge_out/E3093-3_S3_merged_perfect.fastq
@PG ID:samtools PN:samtools PP:bwa VN:1.18 CL:samtools view -bS -
@PG ID:samtools.1 PN:samtools PP:samtools VN:1.18 CL:samtools sort -@ 96 --output-fmt BAM -o ../bwa-mem_out/bbmerged/E3093-3_S3.sorted.bam ../bwa-mem_out/bbmerged/E3093-3_S3.bam
@PG ID:MarkDuplicates VN:3.1.0 CL:MarkDuplicates MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=10000 INPUT=[../bwa-mem_out/bbmerged/E3093-3_S3.sorted.bam] OUTPUT=../bwa-mem_out/bbmerged/E3093-3_S3.sorted.markdup.bam METRICS_FILE=../bwa-mem_out/bbmerged/E3093-3_S3.sorted.markdup.metrics REMOVE_DUPLICATES=false ASSUME_SORTED=true TMP_DIR=[/gpfs/projects/GenomicsCore/codagenix/scripts/tmp] VALIDATION_STRINGENCY=LENIENT MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 SORTING_COLLECTION_SIZE_RATIO=0.25 TAG_DUPLICATE_SET_MEMBERS=false REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag CLEAR_DT=true DUPLEX_UMI=false FLOW_MODE=false FLOW_QUALITY_SUM_STRATEGY=false USE_END_IN_UNPAIRED_READS=false USE_UNPAIRED_CLIPPED_END=false UNPAIRED_END_UNCERTAINTY=0 FLOW_SKIP_FIRST_N_FLOWS=0 FLOW_Q_IS_KNOWN_END=false FLOW_EFFECTIVE_QUALITY_THRESHOLD=15 ADD_PG_TAG_TO_READS=true DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates READ_NAME_REGEX=<optimized capture of last three ':' separated fields as numeric values> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 MAX_OPTICAL_DUPLICATE_SET_SIZE=300000 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false USE_JDK_DEFLATER=false USE_JDK_INFLATER=false PN:MarkDuplicates
@PG ID:samtools.2 PN:samtools PP:samtools.1 VN:1.18 CL:samtools view -H E3093-3_S3.sorted.markdup.bam
I've also noticed that if I supply the reference fasta file in my perbase
command, I get the following error:
[2023-11-13T19:38:56Z INFO perbase_lib::par_granges] Using 94 worker threads.
[2023-11-13T19:38:56Z INFO perbase_lib::par_granges] Creating channel of length 189246910 (* 120 bytes to get mem)
[2023-11-13T19:39:04Z INFO perbase_lib::par_granges] Reading from "bwa-mem_out/bbmerged/E3093-3_S3.sorted.markdup.bam"
[2023-11-13T19:39:04Z INFO perbase_lib::par_granges] Processing TID 0:0-10708
thread '<unnamed>' panicked at 'Fetched reference sequence: Unknown sequence name: ���!.', src/commands/base_depth.rs:275:30
stack backtrace:
0: 0x55555575ea10 - std::backtrace_rs::backtrace::libunwind::trace::ha9053a9a07ca49cb
at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
1: 0x55555575ea10 - std::backtrace_rs::backtrace::trace_unsynchronized::h9c2852a457ad564e
at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
2: 0x55555575ea10 - std::sys_common::backtrace::_print_fmt::h457936fbfaa0070f
at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/sys_common/backtrace.rs:65:5
3: 0x55555575ea10 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h5779d7bf7f70cb0c
at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/sys_common/backtrace.rs:44:22
4: 0x5555556b9bfe - core::fmt::write::h5a4baaff1bcd3eb5
at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/fmt/mod.rs:1232:17
5: 0x555555738ea4 - std::io::Write::write_fmt::h4bc1f301cb9e9cce
at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/io/mod.rs:1684:15
6: 0x55555576023f - std::sys_common::backtrace::_print::h5fcdc36060f177e8
at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/sys_common/backtrace.rs:47:5
7: 0x55555576023f - std::sys_common::backtrace::print::h54ca9458b876c8bf
at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/sys_common/backtrace.rs:34:9
8: 0x55555575fe3f - std::panicking::default_hook::{{closure}}::hbe471161c7664ed6
at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/panicking.rs:271:22
9: 0x555555760e58 - std::panicking::default_hook::ha3500da57aa4ac4f
at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/panicking.rs:290:9
10: 0x555555760e58 - std::panicking::rust_panic_with_hook::h50c09d000dc561d2
at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/panicking.rs:692:13
11: 0x555555760914 - std::panicking::begin_panic_handler::{{closure}}::h9e2b2176e00e0d9c
at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/panicking.rs:583:13
12: 0x555555760876 - std::sys_common::backtrace::__rust_end_short_backtrace::h5739b8e512c09d02
at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/sys_common/backtrace.rs:150:18
13: 0x555555760861 - rust_begin_unwind
at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/panicking.rs:579:5
14: 0x5555555d60b2 - core::panicking::panic_fmt::hf33a1475b4dc5c3e
at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/panicking.rs:64:14
15: 0x5555555d63e2 - core::result::unwrap_failed::hdff5465d74574b44
at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/result.rs:1750:5
16: 0x5555555f669a - <core::iter::adapters::flatten::FlatMap<I,U,F> as core::iter::traits::iterator::Iterator>::next::h4ec7ffc6086dfa1a
17: 0x555555624d34 - rayon::iter::plumbing::bridge_producer_consumer::helper::h40d19706cb30b716
18: 0x5555555ef2ca - rayon_core::join::join_context::{{closure}}::he92c561eced12757
19: 0x5555555e6b56 - rayon_core::thread_pool::ThreadPool::install::{{closure}}::h185c0d76b416e80d
20: 0x555555632daf - <rayon_core::job::StackJob<L,F,R> as rayon_core::job::Job>::execute::hb3e400f8a6fd84fd
21: 0x5555555da5ed - rayon_core::registry::WorkerThread::wait_until_cold::h441bdd8e1d8196e3
22: 0x5555556dc9ba - std::sys_common::backtrace::__rust_begin_short_backtrace::h933f37f9b851e73f
23: 0x5555556dc74d - core::ops::function::FnOnce::call_once{{vtable.shim}}::h1b5aaf29acd6a35f
24: 0x555555761b45 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h39990b24eedef2ab
at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/alloc/src/boxed.rs:1987:9
25: 0x555555761b45 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h01a027258444143b
at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/alloc/src/boxed.rs:1987:9
26: 0x555555761b45 - std::sys::unix::thread::Thread::new::thread_start::ha4f1cdd9c25884ba
at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/sys/unix/thread.rs:108:17
27: 0x7ffff79af1cf - start_thread
28: 0x7ffff7094e73 - clone
29: 0x0 - <unknown>
As far as I can tell, there is nothing wrong with either my reference fasta file or the input bam file. Other programs (samtools, bwa, bbmap, etc.) don't report any issues with them.
Any idea what the problem could be? I'm happy to share the full fasta and/or bam files if it would be useful.
Best, Dave
If you had a minimal example with a bamfile and reference that would be awesome!
I'm really not sure what the issue is. This function is what would generate the ref_seq name, which in these outputs looks very mangled (breadcrumb for myself looking at this again later)
Sure thing. The reference is very small (~10 Kb) and I've extracted a subset of 10k BAM entries that produce the weird results.
Thanks for taking a look!
Best, Dave example.zip
Quick update - I reinstalled the latest version of perbase, and I'm no longer seeing the same errors on the exact same reference fastas and bam files.
I don't know what caused the issue in the first place, but it's working now, so I'm happy! :)
Thanks! Dave
@sstadick Sorry for the delay here! @davidecarlson thanks for chiming in as well. I installed 0.9.0
with no avail. I'll share the data here once I have clearance to do so (hopefully I will soon).
@davidecarlson I'm glad that worked! @twaddlac 👍
I'm trying to run
perbase depth-only in.bam
and I am getting the following output:The bam file isn't corrupt (according to
samtools
) and works with other parts of my pipeline.This reproduced when running
base-depth
as well, and replicated by other users.Running
perbase v0.8.5
Please let me know if I can provide any more information!