pachterlab / kb_python

A wrapper for the kallisto | bustools workflow for single-cell RNA-seq pre-processing
https://www.kallistobus.tools/
BSD 2-Clause "Simplified" License
141 stars 23 forks source link

No reads pseudoaligned in 10XV1 chemistry #247

Closed XiaoMa55555 closed 4 weeks ago

XiaoMa55555 commented 2 months ago

Describe the issue Our lab is performing single nucleus analysis, I am trying to use Kallisto to pre-processing our data for RNA velosity analysis. My pipline is stucked at kb count step.

kb version: 0.28.2 Kallisto version: 1.0.10

Question

  1. Our chemistry is "Single Cell Multiome ATAC + Gene Expression v1", so I set "-x 10XV1". Previously, I failed because it requires three FASTQ files, so I specified R1_001.fastq.gz, R2_001.fastq.gz, and I1_001.fastq.gz. Is this correct?
  2. I tried to use the pre-built index for mouse, so I set "-d mouse" in the "kb ref" step. Some tutorials do not specify -f1 and -f2, but I specified them as shown below. Is this correct?
  3. I have four outputs from "kb ref" step: index.idx, t2g.txt, spliced_t2c.txt, and unspliced_t2c.txt. Is this correct?
  4. What Strand option should I set? The softare setting it to --fr-stranded for specified technology. Is this correct?
  5. No reads pseudoaligned, any suggestion?

What is the exact command that was run?

kb ref -d mouse --overwrite -t 8 -i index.idx --workflow=nac -g t2g.txt -c1 spliced_t2c.txt -c2 unspliced_t2c.txt -f1 cdna.fasta -f2 nascent.fasta genome.fasta genome.gtf

kb count --h5ad -i index.idx -g t2g.txt -x 10XV1 -o CT --overwrite -w=NONE --verbose \
-c1 spliced_t2c.txt -c2 unspliced_t2c.txt --workflow=nac --filter=bustools \
/home/garrydj/data_delivery/umgc/2024-q1/240130_LH00315_0078_A22H7VLLT3/Garry_Project_118/AB-UMGC-10X-2s-GAR-118-P4-Ctrl_S1_L003_R1_001.fastq.gz \
/home/garrydj/data_delivery/umgc/2024-q1/240130_LH00315_0078_A22H7VLLT3/Garry_Project_118/AB-UMGC-10X-2s-GAR-118-P4-Ctrl_S1_L003_R2_001.fastq.gz \
/home/garrydj/data_delivery/umgc/2024-q1/240130_LH00315_0078_A22H7VLLT3/Garry_Project_118/AB-UMGC-10X-2s-GAR-118-P4-Ctrl_S1_L003_I1_001.fastq.gz 

Command output (with --verbose flag)

[2024-04-18 19:30:31,098]   DEBUG [count_nac] [progress] 1M reads processed (0.0% mapped)
...
[2024-04-18 19:30:35,507]   DEBUG [count_nac] [progress] 764M reads processed (0.0% mapped)              done
[2024-04-18 19:30:35,507]   DEBUG [count_nac] [quant] processed 764,709,881 reads, 0 reads pseudoaligned
[2024-04-18 19:30:35,507]   DEBUG [count_nac] [~warn] no reads pseudoaligned.
[2024-04-18 19:30:43,527]   DEBUG [count_nac] 
[2024-04-18 19:30:45,030]   ERROR [count_nac] 
[bus] Note: Strand option was not specified; setting it to --fr-stranded for specified technology
[index] k-mer length: 31
[index] number of targets: 160,309
[index] number of k-mers: 1,101,603,450
[index] number of D-list k-mers: 8,705,140

quant] will process sample 1: /home/garrydj/data_delivery/umgc/2024-q1/240130_LH00315_0078_A22H7VLLT3/Garry_Project_118/AB-UMGC-10X-2s-GAR-118-P4-Ctrl_S1_L003_R1_001.fastq.gz
/home/garrydj/data_delivery/umgc/2024-q1/240130_LH00315_0078_A22H7VLLT3/Garry_Project_118/AB-UMGC-10X-2s-GAR-118-P4-Ctrl_S1_L003_R2_001.fastq.gz
/home/garrydj/data_delivery/umgc/2024-q1/240130_LH00315_0078_A22H7VLLT3/Garry_Project_118/AB-UMGC-10X-2s-GAR-118-P4-Ctrl_S1_L003_I1_001.fastq.gz
[quant] finding pseudoalignments for the reads ...
[progress] 1M reads processed (0.0% mapped)
...
[progress] 764M reads processed (0.0% mapped)              done
[quant] processed 764,709,881 reads, 0 reads pseudoaligned
[~warn] no reads pseudoaligned.

[2024-04-18 19:30:45,033]   ERROR [main] An exception occurred
Traceback (most recent call last):
  File "/home/garrydj/ma000332/.conda/envs/bowtie/lib/python3.10/site-packages/kb_python/main.py", line 1618, in main
    COMMAND_TO_FUNCTION[args.command](parser, args, temp_dir=temp_dir)
  File "/home/garrydj/ma000332/.conda/envs/bowtie/lib/python3.10/site-packages/kb_python/main.py", line 592, in parse_count
    count_nac(
  File "/home/garrydj/ma000332/.conda/envs/bowtie/lib/python3.10/site-packages/ngs_tools/logging.py", line 62, in inner
    return func(*args, **kwargs)
  File "/home/garrydj/ma000332/.conda/envs/bowtie/lib/python3.10/site-packages/kb_python/count.py", line 1791, in count_nac
    bus_result = kallisto_bus(
  File "/home/garrydj/ma000332/.conda/envs/bowtie/lib/python3.10/site-packages/kb_python/count.py", line 203, in kallisto_bus
    run_executable(command)
  File "/home/garrydj/ma000332/.conda/envs/bowtie/lib/python3.10/site-packages/kb_python/dry/__init__.py", line 25, in inner
    return func(*args, **kwargs)
  File "/home/garrydj/ma000332/.conda/envs/bowtie/lib/python3.10/site-packages/kb_python/utils.py", line 203, in run_executable
    raise sp.CalledProcessError(p.returncode, ' '.join(command))
subprocess.CalledProcessError: Command '/home/garrydj/ma000332/.conda/envs/bowtie/lib/python3.10/site-packages/kb_python/bins/linux/kallisto/kallisto bus -i index.idx -o CT -x 10XV1 -t 8 /home/garrydj/data_delivery/umgc/2024-q1/240130_LH00315_0078_A22H7VLLT3/Garry_Project_118/AB-UMGC-10X-2s-GAR-118-P4-Ctrl_S1_L003_R1_001.fastq.gz /home/garrydj/data_delivery/umgc/2024-q1/240130_LH00315_0078_A22H7VLLT3/Garry_Project_118/AB-UMGC-10X-2s-GAR-118-P4-Ctrl_S1_L003_R2_001.fastq.gz /home/garrydj/data_delivery/umgc/2024-q1/240130_LH00315_0078_A22H7VLLT3/Garry_Project_118/AB-UMGC-10X-2s-GAR-118-P4-Ctrl_S1_L003_I1_001.fastq.gz' returned non-zero exit status 1.
[2024-04-18 19:30:46,543]   DEBUG [main] Removing `CT/tmp` directory
...
Yenaled commented 2 months ago

It should go barcode file, then R1, then R2 (supplied in that order).

XiaoMa55555 commented 2 months ago

Thanks for your reply!

I changed the order to:

kb count --h5ad -i index.idx -g t2g.txt -x 10XV1 -o CT --overwrite -w=NONE --verbose  \
-c1 spliced_t2c.txt -c2 unspliced_t2c.txt --workflow=nac --filter=bustools \
/home/garrydj/data_delivery/umgc/2024-q1/240130_LH00315_0078_A22H7VLLT3/Garry_Project_118/AB-UMGC-10X-2s-GAR-118-P4-Ctrl_S1_L003_I1_001.fastq.gz \
/home/garrydj/data_delivery/umgc/2024-q1/240130_LH00315_0078_A22H7VLLT3/Garry_Project_118/AB-UMGC-10X-2s-GAR-118-P4-Ctrl_S1_L003_R1_001.fastq.gz \
/home/garrydj/data_delivery/umgc/2024-q1/240130_LH00315_0078_A22H7VLLT3/Garry_Project_118/AB-UMGC-10X-2s-GAR-118-P4-Ctrl_S1_L003_R2_001.fastq.gz 

And same error exists:

[2024-04-19 23:45:16,662]   DEBUG [count_nac] [quant] processed 764,709,881 reads, 0 reads pseudoaligned
[2024-04-19 23:45:16,662]   DEBUG [count_nac] [~warn] no reads pseudoaligned.
[2024-04-19 23:45:16,763]   DEBUG [count_nac] 
[2024-04-19 23:45:18,365]   ERROR [count_nac] 
[bus] Note: Strand option was not specified; setting it to --fr-stranded for specified technology
[index] k-mer length: 31
[index] number of targets: 160,309
[index] number of k-mers: 1,101,603,450
[index] number of D-list k-mers: 8,705,140
[quant] will process sample 1: /home/garrydj/data_delivery/umgc/2024-q1/240130_LH00315_0078_A22H7VLLT3/Garry_Project_118/AB-UMGC-10X-2s-GAR-118-P4-Ctrl_S1_L003_I1_001.fastq.gz
/home/garrydj/data_delivery/umgc/2024-q1/240130_LH00315_0078_A22H7VLLT3/Garry_Project_118/AB-UMGC-10X-2s-GAR-118-P4-Ctrl_S1_L003_R1_001.fastq.gz
/home/garrydj/data_delivery/umgc/2024-q1/240130_LH00315_0078_A22H7VLLT3/Garry_Project_118/AB-UMGC-10X-2s-GAR-118-P4-Ctrl_S1_L003_R2_001.fastq.gz
[quant] finding pseudoalignments for the reads ...
[progress] 1M reads processed (0.0% mapped)
..........
[progress] 764M reads processed (0.0% mapped)              done
[quant] processed 764,709,881 reads, 0 reads pseudoaligned
[~warn] no reads pseudoaligned.

[2024-04-19 23:45:18,368]   ERROR [main] An exception occurred
Traceback (most recent call last):
  File "/home/garrydj/ma000332/.conda/envs/bowtie/lib/python3.10/site-packages/kb_python/main.py", line 1618, in main
    COMMAND_TO_FUNCTION[args.command](parser, args, temp_dir=temp_dir)
  File "/home/garrydj/ma000332/.conda/envs/bowtie/lib/python3.10/site-packages/kb_python/main.py", line 592, in parse_count
    count_nac(
  File "/home/garrydj/ma000332/.conda/envs/bowtie/lib/python3.10/site-packages/ngs_tools/logging.py", line 62, in inner
    return func(*args, **kwargs)
  File "/home/garrydj/ma000332/.conda/envs/bowtie/lib/python3.10/site-packages/kb_python/count.py", line 1791, in count_nac
    bus_result = kallisto_bus(
  File "/home/garrydj/ma000332/.conda/envs/bowtie/lib/python3.10/site-packages/kb_python/count.py", line 203, in kallisto_bus
    run_executable(command)
  File "/home/garrydj/ma000332/.conda/envs/bowtie/lib/python3.10/site-packages/kb_python/dry/__init__.py", line 25, in inner
    return func(*args, **kwargs)
  File "/home/garrydj/ma000332/.conda/envs/bowtie/lib/python3.10/site-packages/kb_python/utils.py", line 203, in run_executable
    raise sp.CalledProcessError(p.returncode, ' '.join(command))
subprocess.CalledProcessError: Command '/home/garrydj/ma000332/.conda/envs/bowtie/lib/python3.10/site-packages/kb_python/bins/linux/kallisto/kallisto bus -i index.idx -o CT -x 10XV1 -t 8 /home/garrydj/data_delivery/umgc/2024-q1/240130_LH00315_0078_A22H7VLLT3/Garry_Project_118/AB-UMGC-10X-2s-GAR-118-P4-Ctrl_S1_L003_I1_001.fastq.gz /home/garrydj/data_delivery/umgc/2024-q1/240130_LH00315_0078_A22H7VLLT3/Garry_Project_118/AB-UMGC-10X-2s-GAR-118-P4-Ctrl_S1_L003_R1_001.fastq.gz /home/garrydj/data_delivery/umgc/2024-q1/240130_LH00315_0078_A22H7VLLT3/Garry_Project_118/AB-UMGC-10X-2s-GAR-118-P4-Ctrl_S1_L003_R2_001.fastq.gz' returned non-zero exit status 1.

My run_info.json file shows:

n_targets:160309
n_bootstraps:0
n_processed:764709881
n_pseudoaligned:0
n_unique:0
p_pseudoaligned:0
p_unique:0
kallisto_version:"0.50.1"
index_version:13
start_time:"Fri Apr 19 23:24:57 2024"
call:"/home/garrydj/ma000332/.conda/envs/bowtie/lib/python3.10/site-packages/kb_python/bins/linux/kallisto/kallisto bus -i index.idx -o CT -x 10XV1 -t 8 /home/garrydj/data_delivery/umgc/2024-q1/240130_LH00315_0078_A22H7VLLT3/Garry_Project_118/AB-UMGC-10X-2s-GAR-118-P4-Ctrl_S1_L003_I1_001.fastq.gz /home/garrydj/data_delivery/umgc/2024-q1/240130_LH00315_0078_A22H7VLLT3/Garry_Project_118/AB-UMGC-10X-2s-GAR-118-P4-Ctrl_S1_L003_R1_001.fastq.gz /home/garrydj/data_delivery/umgc/2024-q1/240130_LH00315_0078_A22H7VLLT3/Garry_Project_118/AB-UMGC-10X-2s-GAR-118-P4-Ctrl_S1_L003_R2_001.fastq.gz"
Yenaled commented 2 months ago

Hmm, maybe R2 then R1? How about looking at a few reads manually and running them through NCBI BLAST?

XiaoMa55555 commented 2 months ago

Hi, I just discovered that our fastq files have not had adapter and UMI sequences trimmed.

  1. I just discovered that our fastq files have not had adapter and UMI sequences trimm, could this be the reason of my un_info.json file result?
  2. What should we do with the I1 and I2 fastq files?
  3. Our lab was using the Chromium Single Cell Multiome ATAC + Gene Expression v1 kit. Is "-x 10XV1" the correct argument?

Thanks!

Yenaled commented 2 months ago
  1. No
  2. Nothing
  3. I think so? But very easy to check once you get the pseudoalignments working. It will be very obvious if something’s off (I.e. no barcodes end up getting corrected to the barcodes list).

again, follow the BLAST suggestion I already gave

github-actions[bot] commented 1 month ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days