Closed kaizen89 closed 3 years ago
Hi, @kaizen89 Could you post the command you used to generate the index, as well as where you downloaded the FASTA and GTF?
Hi @Lioscro , I followed this tutorial . Replacing only the mouse genome and annotation with human ones.
I see you've been referring to the R tutorial! Unfortunately, I was not involved in writing this up, so I am not sure what may be happening.
One thing you could try is building the index using kb ref
, instead of R directly.
I haven't encountered any issues going with this approach of building the index.
Since you are building an index for RNA velocity, here is an example of a command (for mouse).
kb ref -i index.idx -g t2g.txt -f1 cdna.fa -f2 intron.fa -c1 cdna_t2c.txt -c2 intron_t2c.txt --workflow lamanno -n 8 \
Mus_musculus.GRCm38.dna.primary_assembly.fa.gz \
Mus_musculus.GRCm38.98.gtf.gz
You can also refer to the tutorial here: https://colab.research.google.com/github/pachterlab/kallistobustools/blob/master/notebooks/kb_velocity_index.ipynb
Hi @Lioscro , unfortunately I still get similar error. I tried to use the index found here but there is an error at the end.
time kb count --h5ad -i index.idx -g transcripts_to_genes.txt -x 10xv3 -o SRR12603789 -c1 cdna_transcripts_to_capture.txt -c2 intron_transcripts_to_capture.txt --lamanno /mnt/sda3/data/public_data/fastq/SRR12603789/SRR12603789_S1_L001_R1_001.fastq.gz /mnt/sda3/data/public_data/fastq/SRR12603789/SRR12603789_S1_L001_R2_001.fastq.gz
[2021-02-19 20:56:06,115] WARNING The `--lamanno` and `-`-n`ucleus` flags are deprecated. These options will be removed in a future release. Please use `--workflow lamanno` or `--workflow nucleus` instead.
[2021-02-19 20:56:06,115] INFO Using index index.idx to generate BUS file to SRR12603789 from
[2021-02-19 20:56:06,115] INFO /mnt/sda3/data/public_data/fastq/SRR12603789/SRR12603789_S1_L001_R1_001.fastq.gz
[2021-02-19 20:56:06,115] INFO /mnt/sda3/data/public_data/fastq/SRR12603789/SRR12603789_S1_L001_R2_001.fastq.gz
[2021-02-19 21:14:32,501] ERROR
[index] k-mer length: 31
[index] number of targets: 845,338
[index] number of k-mers: 271,648,279
[index] number of equivalence classes: 4,776,424
[quant] will process sample 1: /mnt/sda3/data/public_data/fastq/SRR12603789/SRR12603789_S1_L001_R1_001.fastq.gz
/mnt/sda3/data/public_data/fastq/SRR12603789/SRR12603789_S1_L001_R2_001.fastq.gz
[quant] finding pseudoalignments for the reads ... done
[quant] processed 819,658,242 reads, 0 reads pseudoaligned[~warn] no reads pseudoaligned.
[2021-02-19 21:14:32,501] ERROR An exception occurred
Traceback (most recent call last):
File "/home/salmon/.local/lib/python3.6/site-packages/kb_python/main.py", line 837, in main
COMMAND_TO_FUNCTION[args.command](parser, args, temp_dir=temp_dir)
File "/home/salmon/.local/lib/python3.6/site-packages/kb_python/main.py", line 218, in parse_count
temp_dir=temp_dir
File "/home/salmon/.local/lib/python3.6/site-packages/kb_python/count.py", line 1510, in count_velocity
fastqs, index_paths[0], technology, out_dir, threads=threads
File "/home/salmon/.local/lib/python3.6/site-packages/kb_python/validate.py", line 112, in inner
results = func(*args, **kwargs)
File "/home/salmon/.local/lib/python3.6/site-packages/kb_python/count.py", line 149, in kallisto_bus
run_executable(command)
File "/home/salmon/.local/lib/python3.6/site-packages/kb_python/dry/__init__.py", line 24, in inner
return func(*args, **kwargs)
File "/home/salmon/.local/lib/python3.6/site-packages/kb_python/utils.py", line 233, in run_executable
raise sp.CalledProcessError(p.returncode, ' '.join(command))
subprocess.CalledProcessError: Command '/home/salmon/.local/lib/python3.6/site-packages/kb_python/bins/linux/kallisto/kallisto bus -i index.idx -o SRR12603789 -x 10xv3 -t 8 /mnt/sda3/data/public_data/fastq/SRR12603789/SRR12603789_S1_L001_R1_001.fastq.gz /mnt/sda3/data/public_data/fastq/SRR12603789/SRR12603789_S1_L001_R2_001.fastq.gz' returned non-zero exit status 1.
real 18m27,371s
user 19m42,208s
sys 0m12,165s
@Lioscro After trying other fastq files it appears the problem is not coming from kb
but from the fastqs. I downloaded these files as SRR12603789_1.fastq.gz
and SRR12603789_2.fastq.gz
which I renamed as SRR12603789_S1_L001_R1_001.fastq.gz
and SRR12603789_S1_L001_R2_001.fastq.gz
.
Other fastqs are not giving any error.
zcat HESA03_HSB42I_0_G_S17_L001_R2_001.fastq.gz | head
@A00213:234:HT3JKDMXX:1:1101:30617:1000 2:N:0:NTGTTTCC
TATTATCGAAACCATCAGCCTGCTCATTCAACCAATAGCCCTAGCCGTACGCCTAACCGCT
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFF,FFFFFFFFFFF:FFFFFF
@A00213:234:HT3JKDMXX:1:1101:31340:1000 2:N:0:NTGTTTCC
AAGCAGTGGTATCAACGCAGAGTACATGGGGAGAGTAAAAAAAAAAAAACACAGAAGAGAG
+
FFFFFF,FFF,FFFFFFF,,FFFFFFFFFFF,FFFFF:FFFF:F,FFFFFFF::FFFFFFF
@A00213:234:HT3JKDMXX:1:1101:32353:1000 2:N:0:NTGTTTCC
GGCATCTCTTGTGTACTTATTGTTTAAGGTTTCCTCAAACTGTGATTTTTCTGAACACAAT
However this one gives the error
zcat /mnt/sda3/data/public_data/fastq/SRR12603789/SRR12603789_S1_L001_R2_001.fastq.gz | head
@SRR12603789.1 1/2
CACTCCAGTGCTCAGCTTGCACCCTGGCACAGGCCAGCAGTTGCTGGAAGTCAGACACCTGCAGATGAAGACCACAGCATCAAGACCCTGTGACCTCTCAAAGGCCCGGTGGAAAGGACACGGGAAGTCTGGGCTAAGAGACAGCAAATA
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFF:FFFFFFFFFFFFFFFF:FFFFFF:
@SRR12603789.2 2/2
ATCCCACTCTAGGCATGGCTCCTCTCCACAGGAAAACTCCACTCCAGTGCTCAGCTTGCACCCTGGCACAGGCCAGCAGTTGCTGGAAGTCAGACACCTGCAGATGAAGACCACAGCATCAAGACCCTGTGACCTCTCAAAGGCCCGGTG
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFF
@SRR12603789.3 3/2
CAATGGCCCATCCCACTCTAGGCATGGCTCCTCTCCACAGGAAAACTCCACTCCAGTGCTCAGCTTGCACCCTGGCACAGGCCAGCAGTTGCTGGAAGTCAGACACCTGCAGATGAAGACCACAGCATCAAGACCCTGTGACCTCTCAAA
Both samples have been processed with cellranger and velocyto.py without issues Could you please have a look at the last file and tell me if you see something wrong with it? Thank you
Turns out contrary to what was mentioned in the paper, the tech used is v2 and not v3. No error when correcting this.
Hi, I am trying to follow the RNA velocity tutorial and using kb-python, after generating the index file, the following command throws an error. I noticed that during the index generation I got a warning that more than 2M 0 non-ACGUT characters were found and replaced.
Does anyone know how to solve this issue and what is causing it?