pachterlab / kb_python

A wrapper for the kallisto | bustools workflow for single-cell RNA-seq pre-processing
https://www.kallistobus.tools/
BSD 2-Clause "Simplified" License
147 stars 23 forks source link

0 reads pseudoaligned[~warn] no reads pseudoaligned #95

Closed kaizen89 closed 3 years ago

kaizen89 commented 3 years ago

Hi, I am trying to follow the RNA velocity tutorial and using kb-python, after generating the index file, the following command throws an error. I noticed that during the index generation I got a warning that more than 2M 0 non-ACGUT characters were found and replaced.

kb count -i /home/salmon/Documents/Github/BUS_notebooks_R-master/analysis/output/hs_cDNA_introns_97.idx -g tr2g.tsv -x 10xv3 -o kb -c1 cDNA_tx_to_capture.txt -c2 introns_tx_to_capture.txt --lamanno /mnt/sda3/data/public_data/fastq/sample1_S1_L001_R1_001.fastq.gz /mnt/sda3/data/public_data/fastq/sample1_S1_L001_R2_001.fastq.gz -t 16 -m 70G
[2021-01-29 13:12:08,965] WARNING The `--lamanno` and `-`-n`ucleus` flags are deprecated. These options will be removed in a future release. Please use `--workflow lamanno` or `--workflow nucleus` instead.
[2021-01-29 13:12:08,965]    INFO Using index /home/salmon/Documents/Github/BUS_notebooks_R-master/analysis/output/hs_cDNA_introns_97.idx to generate BUS file to kb from
[2021-01-29 13:12:08,965]    INFO         /mnt/sda3/data/public_data/fastq/sample1_S1_L001_R1_001.fastq.gz
[2021-01-29 13:12:08,965]    INFO         /mnt/sda3/data/public_data/fastq/sample1_S1_L001_R2_001.fastq.gz
[2021-01-29 13:36:32,560]   ERROR 
[index] k-mer length: 31
[index] number of targets: 1,378,373
[index] number of k-mers: 1,560,141,285
[index] number of equivalence classes: 12,887,284
[quant] will process sample 1:  /mnt/sda3/data/public_data/fastq/sample1_S1_L001_R1_001.fastq.gz
/mnt/sda3/data/public_data/fastq/sample1_S1_L001_R2_001.fastq.gz
[quant] finding pseudoalignments for the reads ... done
[quant] processed 819,658,242 reads, 0 reads pseudoaligned[~warn] no reads pseudoaligned.

[2021-01-29 13:36:32,561]   ERROR An exception occurred
Traceback (most recent call last):
  File "/home/salmon/.local/lib/python3.6/site-packages/kb_python/main.py", line 837, in main
    COMMAND_TO_FUNCTION[args.command](parser, args, temp_dir=temp_dir)
  File "/home/salmon/.local/lib/python3.6/site-packages/kb_python/main.py", line 218, in parse_count
    temp_dir=temp_dir
  File "/home/salmon/.local/lib/python3.6/site-packages/kb_python/count.py", line 1510, in count_velocity
    fastqs, index_paths[0], technology, out_dir, threads=threads
  File "/home/salmon/.local/lib/python3.6/site-packages/kb_python/validate.py", line 112, in inner
    results = func(*args, **kwargs)
  File "/home/salmon/.local/lib/python3.6/site-packages/kb_python/count.py", line 149, in kallisto_bus
    run_executable(command)
  File "/home/salmon/.local/lib/python3.6/site-packages/kb_python/dry/__init__.py", line 24, in inner
    return func(*args, **kwargs)
  File "/home/salmon/.local/lib/python3.6/site-packages/kb_python/utils.py", line 233, in run_executable
    raise sp.CalledProcessError(p.returncode, ' '.join(command))
subprocess.CalledProcessError: Command '/home/salmon/.local/lib/python3.6/site-packages/kb_python/bins/linux/kallisto/kallisto bus -i /home/salmon/Documents/Github/BUS_notebooks_R-master/analysis/output/hs_cDNA_introns_97.idx -o kb -x 10xv3 -t 16 /mnt/sda3/data/public_data/fastq/sample1_S1_L001_R1_001.fastq.gz /mnt/sda3/data/public_data/fastq/sample1_S1_L001_R2_001.fastq.gz' returned non-zero exit status 1.

Does anyone know how to solve this issue and what is causing it?

Lioscro commented 3 years ago

Hi, @kaizen89 Could you post the command you used to generate the index, as well as where you downloaded the FASTA and GTF?

kaizen89 commented 3 years ago

Hi @Lioscro , I followed this tutorial . Replacing only the mouse genome and annotation with human ones.

Lioscro commented 3 years ago

I see you've been referring to the R tutorial! Unfortunately, I was not involved in writing this up, so I am not sure what may be happening.

One thing you could try is building the index using kb ref, instead of R directly. I haven't encountered any issues going with this approach of building the index. Since you are building an index for RNA velocity, here is an example of a command (for mouse).

kb ref -i index.idx -g t2g.txt -f1 cdna.fa -f2 intron.fa -c1 cdna_t2c.txt -c2 intron_t2c.txt --workflow lamanno -n 8 \
Mus_musculus.GRCm38.dna.primary_assembly.fa.gz \
Mus_musculus.GRCm38.98.gtf.gz

You can also refer to the tutorial here: https://colab.research.google.com/github/pachterlab/kallistobustools/blob/master/notebooks/kb_velocity_index.ipynb

kaizen89 commented 3 years ago

Hi @Lioscro , unfortunately I still get similar error. I tried to use the index found here but there is an error at the end.

time kb count --h5ad -i index.idx -g transcripts_to_genes.txt -x 10xv3 -o SRR12603789 -c1 cdna_transcripts_to_capture.txt -c2 intron_transcripts_to_capture.txt --lamanno /mnt/sda3/data/public_data/fastq/SRR12603789/SRR12603789_S1_L001_R1_001.fastq.gz /mnt/sda3/data/public_data/fastq/SRR12603789/SRR12603789_S1_L001_R2_001.fastq.gz
[2021-02-19 20:56:06,115] WARNING The `--lamanno` and `-`-n`ucleus` flags are deprecated. These options will be removed in a future release. Please use `--workflow lamanno` or `--workflow nucleus` instead.
[2021-02-19 20:56:06,115]    INFO Using index index.idx to generate BUS file to SRR12603789 from
[2021-02-19 20:56:06,115]    INFO         /mnt/sda3/data/public_data/fastq/SRR12603789/SRR12603789_S1_L001_R1_001.fastq.gz
[2021-02-19 20:56:06,115]    INFO         /mnt/sda3/data/public_data/fastq/SRR12603789/SRR12603789_S1_L001_R2_001.fastq.gz
[2021-02-19 21:14:32,501]   ERROR 
[index] k-mer length: 31
[index] number of targets: 845,338
[index] number of k-mers: 271,648,279
[index] number of equivalence classes: 4,776,424
[quant] will process sample 1: /mnt/sda3/data/public_data/fastq/SRR12603789/SRR12603789_S1_L001_R1_001.fastq.gz
/mnt/sda3/data/public_data/fastq/SRR12603789/SRR12603789_S1_L001_R2_001.fastq.gz
[quant] finding pseudoalignments for the reads ... done
[quant] processed 819,658,242 reads, 0 reads pseudoaligned[~warn] no reads pseudoaligned.

[2021-02-19 21:14:32,501]   ERROR An exception occurred
Traceback (most recent call last):
  File "/home/salmon/.local/lib/python3.6/site-packages/kb_python/main.py", line 837, in main
    COMMAND_TO_FUNCTION[args.command](parser, args, temp_dir=temp_dir)
  File "/home/salmon/.local/lib/python3.6/site-packages/kb_python/main.py", line 218, in parse_count
    temp_dir=temp_dir
  File "/home/salmon/.local/lib/python3.6/site-packages/kb_python/count.py", line 1510, in count_velocity
    fastqs, index_paths[0], technology, out_dir, threads=threads
  File "/home/salmon/.local/lib/python3.6/site-packages/kb_python/validate.py", line 112, in inner
    results = func(*args, **kwargs)
  File "/home/salmon/.local/lib/python3.6/site-packages/kb_python/count.py", line 149, in kallisto_bus
    run_executable(command)
  File "/home/salmon/.local/lib/python3.6/site-packages/kb_python/dry/__init__.py", line 24, in inner
    return func(*args, **kwargs)
  File "/home/salmon/.local/lib/python3.6/site-packages/kb_python/utils.py", line 233, in run_executable
    raise sp.CalledProcessError(p.returncode, ' '.join(command))
subprocess.CalledProcessError: Command '/home/salmon/.local/lib/python3.6/site-packages/kb_python/bins/linux/kallisto/kallisto bus -i index.idx -o SRR12603789 -x 10xv3 -t 8 /mnt/sda3/data/public_data/fastq/SRR12603789/SRR12603789_S1_L001_R1_001.fastq.gz /mnt/sda3/data/public_data/fastq/SRR12603789/SRR12603789_S1_L001_R2_001.fastq.gz' returned non-zero exit status 1.

real    18m27,371s
user    19m42,208s
sys 0m12,165s
kaizen89 commented 3 years ago

@Lioscro After trying other fastq files it appears the problem is not coming from kb but from the fastqs. I downloaded these files as SRR12603789_1.fastq.gz and SRR12603789_2.fastq.gz which I renamed as SRR12603789_S1_L001_R1_001.fastq.gz and SRR12603789_S1_L001_R2_001.fastq.gz. Other fastqs are not giving any error.

zcat HESA03_HSB42I_0_G_S17_L001_R2_001.fastq.gz | head
@A00213:234:HT3JKDMXX:1:1101:30617:1000 2:N:0:NTGTTTCC
TATTATCGAAACCATCAGCCTGCTCATTCAACCAATAGCCCTAGCCGTACGCCTAACCGCT
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFF,FFFFFFFFFFF:FFFFFF
@A00213:234:HT3JKDMXX:1:1101:31340:1000 2:N:0:NTGTTTCC
AAGCAGTGGTATCAACGCAGAGTACATGGGGAGAGTAAAAAAAAAAAAACACAGAAGAGAG
+
FFFFFF,FFF,FFFFFFF,,FFFFFFFFFFF,FFFFF:FFFF:F,FFFFFFF::FFFFFFF
@A00213:234:HT3JKDMXX:1:1101:32353:1000 2:N:0:NTGTTTCC
GGCATCTCTTGTGTACTTATTGTTTAAGGTTTCCTCAAACTGTGATTTTTCTGAACACAAT

However this one gives the error

zcat /mnt/sda3/data/public_data/fastq/SRR12603789/SRR12603789_S1_L001_R2_001.fastq.gz | head
@SRR12603789.1 1/2
CACTCCAGTGCTCAGCTTGCACCCTGGCACAGGCCAGCAGTTGCTGGAAGTCAGACACCTGCAGATGAAGACCACAGCATCAAGACCCTGTGACCTCTCAAAGGCCCGGTGGAAAGGACACGGGAAGTCTGGGCTAAGAGACAGCAAATA
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFF:FFFFFFFFFFFFFFFF:FFFFFF:
@SRR12603789.2 2/2
ATCCCACTCTAGGCATGGCTCCTCTCCACAGGAAAACTCCACTCCAGTGCTCAGCTTGCACCCTGGCACAGGCCAGCAGTTGCTGGAAGTCAGACACCTGCAGATGAAGACCACAGCATCAAGACCCTGTGACCTCTCAAAGGCCCGGTG
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFF
@SRR12603789.3 3/2
CAATGGCCCATCCCACTCTAGGCATGGCTCCTCTCCACAGGAAAACTCCACTCCAGTGCTCAGCTTGCACCCTGGCACAGGCCAGCAGTTGCTGGAAGTCAGACACCTGCAGATGAAGACCACAGCATCAAGACCCTGTGACCTCTCAAA

Both samples have been processed with cellranger and velocyto.py without issues Could you please have a look at the last file and tell me if you see something wrong with it? Thank you

kaizen89 commented 3 years ago

Turns out contrary to what was mentioned in the paper, the tech used is v2 and not v3. No error when correcting this.