pachterlab / kallisto

Near-optimal RNA-Seq quantification
https://pachterlab.github.io/kallisto
BSD 2-Clause "Simplified" License
650 stars 171 forks source link

Wrong number of fastqs for 10xv1 #241

Open dburkhardt opened 4 years ago

dburkhardt commented 4 years ago

I have 10X 3' v1 chemistry data with four fastq files:

EBT0_1A_S1_L001_I1_001.fastq.gz # 8 nt per read
EBT0_1A_S1_L001_R1_001.fastq.gz # 98 nt per read
EBT0_1A_S1_L001_R2_001.fastq.gz # 16 nt per read
EBT0_1A_S1_L001_R3_001.fastq.gz # 10 nt per read

The reason I believe this to be v1 is the output of the websummary from 10X

Untitled

When I run the following command:

/home/dan/.local/lib/python3.7/site-packages/kb_python/bins/linux/kallisto/kallisto bus \
  -i /data/sai/kallisto_indices/human/index.idx \
  -o /data/lab/datasets/Krishnaswamy_2017_Embryoid_Body_Timecourse/kallisto/EBT0_1A \
  -x 10xv1 -t 1 \
  EBT0_1A_S1_L001_I1_001.fastq.gz EBT0_1A_S1_L001_R1_001.fastq.gz EBT0_1A_S1_L001_R2_001.fastq.gz EBT0_1A_S1_L001_R3_001.fastq.gz

I get the following error:

Error: Number of files (4) does not match number of input files required by technology 10XV1 (3)
kallisto 0.46.1

I had no issues running velocyto on this data, and when I run bamtofastq on the possorted BAM file, I get these four files as output. Any idea what's going on? I have no idea why I would have a R3 file. Should I ignore it?

mliiv commented 4 years ago

If I understand the output correctly, there should be 2 files per sample, usually it's _L001_R1_001.fastq.gz which is read 1 having cell barcodes, and _L001_R2_001.fastq.gz which is read 2 with the RNA-seq. So I would just try using these two files as input.