mehdiborji / nanoranger

simplified cellranger for long-read data
MIT License
15 stars 3 forks source link

TCR matching error #3

Open swluo1 opened 4 months ago

swluo1 commented 4 months ago

When I run 5p10XTCR with the example data TCR3.fastq.gz on my MacOS system, I got the error: "Traceback (most recent call last): File "/Users/Home/nanoranger/pipeline.py", line 236, in utils.process_matching_5p10XTCR(sample,outdir) File "/Users/Home/nanoranger/utils.py", line 733, in process_matching_5p10XTCR scores=sort_cnt(all_AS[all_AS[:,1]==0][:,0]) IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed"

EdGreen21 commented 4 months ago

I got same error in linux when I made an error in specifying the input fastq.gz file - are you sure the path is correct?

mehdiborji commented 4 months ago

@swluo1 Thank you for trying out nanoranger!

Could you check if the file {sample}_matching.sam exists and contains alignments? Alternatively you can look at footprint of STAR aligner and see it has produced any error.

I have observed this error when the {sample}_matching.sam file is empty (no alignments) and the array of alignment scores is, as a consequence, also empty.

This can happen on MacOS because the function for reading the {sample}_BCUMI.fasta.gz is set to be zcat (file scripts/barcode_align.sh, line --readFilesCommand zcat) which is not available by default on MacOS systems. You may change this to --readFilesCommand gunzip -c.

swluo1 commented 4 months ago

@mehdiborji Thanks, I tried to replace zcat with gunzip -c or gzip -d, but still got this error. The {sample}_matching.sam file is not empty, I attached it here TCR_matching.sam.zip.

Following is the log:

nanoranger packages loaded

alignment to transcriptome reference and defusing/deconcatenation

cores = 8 ref = /Users/Home/nanoranger/data/TR_V_human.fa infile= /Users/Home/nanoranger/sample_fastq/TCR3.fastq.gz outdir = TCR sample = TCR [M::mm_idx_gen::0.0021.98] collected minimizers [M::mm_idx_gen::0.0055.07] sorted minimizers [M::main::0.0055.01] loaded/built the index for 106 target sequence(s) [M::mm_mapopt_update::0.0054.78] mid_occ = 10 [M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 106 [M::mm_idx_stat::0.0054.59] distinct minimizers: 6920 (90.45% are singletons); average occurrences: 1.150; average spacing: 5.626; total length: 44774 [M::worker_pipeline::0.0914.64] mapped 4000 sequences [M::main] Version: 2.26-r1175 [M::main] CMD: minimap2 -aY --eqx -x map-ont -t 8 --secondary=no --sam-hit-only /Users/Home/nanoranger/data/TR_V_human.fa /Users/Home/nanoranger/sample_fastq/TCR3.fastq.gz [M::main] Real time: 0.093 sec; CPU: 0.426 sec; Peak RSS: 0.012 GB filename = TCR/TCR_deconcat.fastq.gz save_prefix = TCR/TCR species = hsa nthreads = 8 Alignment: 78.2% Alignment: 100% ETA: 00:00:00 ============= Report ============== Analysis time: 1.51s Total sequencing reads: 2800 Successfully aligned reads: 2153 (76.89%) Alignment failed, no hits (not TCR/IG?): 32 (1.14%) Alignment failed because of absence of J hits: 609 (21.75%) Alignment failed because of low total score: 6 (0.21%) Overlapped: 0 (0%) Overlapped and aligned: 0 (0%) Alignment-aided overlaps: 0 (NaN%) Overlapped and not aligned: 0 (0%) TRA chains: 1051 (48.82%) TRB chains: 1102 (51.18%) Realigned with forced non-floating bound: 0 (0%) Realigned with forced non-floating right bound in left read: 0 (0%) Realigned with forced non-floating left bound in right read: 0 (0%) Initialization: progress unknown Writing clones: 0% ============= Report ============== Analysis time: 855.00ms Final clonotype count: 283 Average number of reads per clonotype: 3.17 Reads used in clonotypes, percent of total: 897 (32.04%) Reads used in clonotypes before clustering, percent of total: 897 (32.04%) Number of reads used as a core, percent of used: 856 (95.43%) Mapped low quality reads, percent of used: 41 (4.57%) Reads clustered in PCR error correction, percent of used: 37 (4.12%) Reads pre-clustered due to the similar VJC-lists, percent of used: 0 (0%) Reads dropped due to the lack of a clone sequence, percent of total: 54 (1.93%) Reads dropped due to low quality, percent of total: 67 (2.39%) Reads dropped due to failed mapping, percent of total: 1135 (40.54%) Reads dropped with low quality clones, percent of total: 0 (0%) Clonotypes eliminated by PCR error correction: 19 Clonotypes dropped as low quality: 0 Clonotypes pre-clustered due to the similar VJC-lists: 0 TRA chains: 156 (55.12%) TRB chains: 127 (44.88%) Exporting clones: 0% Initialization: progress unknown Preparing for sorting: progress unknown ============= Report ============== Analysis time: 1.35s Final clonotype count: 283 Average number of reads per clonotype: 3.17 Reads used in clonotypes, percent of total: 897 (32.04%) Reads used in clonotypes before clustering, percent of total: 897 (32.04%) Number of reads used as a core, percent of used: 856 (95.43%) Mapped low quality reads, percent of used: 41 (4.57%) Reads clustered in PCR error correction, percent of used: 37 (4.12%) Reads pre-clustered due to the similar VJC-lists, percent of used: 0 (0%) Reads dropped due to the lack of a clone sequence, percent of total: 54 (1.93%) Reads dropped due to low quality, percent of total: 67 (2.39%) Reads dropped due to failed mapping, percent of total: 1135 (40.54%) Reads dropped with low quality clones, percent of total: 0 (0%) Clonotypes eliminated by PCR error correction: 19 Clonotypes dropped as low quality: 0 Clonotypes pre-clustered due to the similar VJC-lists: 0 TRA chains: 156 (55.12%) TRB chains: 127 (44.88%) TCR/TCR_bcreads.fasta TCR/TCR_ref/ Feb 20 17:07:15 ..... started STAR run Feb 20 17:07:15 ... starting to generate Genome files Feb 20 17:07:19 ... starting to sort Suffix Array. This may take a long time... Feb 20 17:07:19 ... sorting Suffix Array chunks and saving them to disk... Feb 20 17:07:24 ... loading chunks from disk, packing SA... Feb 20 17:07:24 ... finished generating suffix array Feb 20 17:07:24 ... generating Suffix Array index Feb 20 17:07:24 ... completed Suffix Array index Feb 20 17:07:24 ... writing Genome to disk ... Feb 20 17:07:24 ... writing Suffix Array to disk ... Feb 20 17:07:24 ... writing SAindex to disk Feb 20 17:07:24 ..... finished successfully TCR/TCR_BCUMI.fasta.gz TCR/TCR_ref/ TCR/TCR_matching Feb 20 17:07:24 ..... started STAR run Feb 20 17:07:24 ..... loading genome Feb 20 17:07:27 ..... started mapping Feb 20 17:07:27 ..... finished successfully

generate clone-barcode-UMI table

clone filtering finished number of short UMI reads = 0 Traceback (most recent call last): File "/Users/Home/nanoranger/pipeline.py", line 236, in utils.process_matching_5p10XTCR(sample,outdir) File "/Users/Home/nanoranger/utils.py", line 733, in process_matching_5p10XTCR scores=sort_cnt(all_AS[all_AS[:,1]==0][:,0]) IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

mehdiborji commented 4 months ago

@swluo1 thank you for sharing this information. The sam file you shared is indeed empty in the sense it has no alignments as samtools view -c TCR_matching.sam returns 0.

The contents in the file are just simply the headers of reference sequences (here the 737k 10x5' barcodes).

Can you also share the BCUMI.fastq.gz file you get from the previous step?

I still believe the STAR aligner is not performing the alignment for some reason. What version of STAR you have installed?

mehdiborji commented 4 months ago

@swluo1 I am realizing the version of STAR you are running is most likely a very old and potentially problematic one (by looking at the sam header it seems it is 2.5.2b which dates back to 2016). I highly recommend you update to the latest version of STAR and try it again.

EdGreen21 commented 4 months ago

I found the most recent version of star to work with nanoranger to be 2.7.9a

mehdiborji commented 4 months ago

@EdGreen21 that is indeed the version I initially used to develop this tool so I cannot say for certain how much backward compatibility there will be from that version. I have updated my STAR to 2.7.11b recently and there seems to be forward compatibility :)