Closed Celine-Serry closed 3 months ago
This is not a kb-python issue; the number of reads processed (reported by kb-python) is just the number of reads in your FASTQ file. You should probably check your R1 and R2 FASTQ files by running:
zcat < name_of_fastq_file_R1.fastq.gz|wc -l
zcat < name_of_fastq_file_R2.fastq.gz|wc -l
to get the number of lines in your FASTQ files.
zcat < pbmc_1k_v3_S1_mergedLanes_R1.fastq.gz|wc -l
returns: 266407548
zcat < pbmc_1k_v3_S1_mergedLanes_R2.fastq.gz|wc -l
returns:
gzip: stdin: invalid compressed data--crc error
gzip: stdin: invalid compressed data--length error
14101600
Yup, as you can see, your R2 file is incomplete (either you downloaded a partial file, e.g. your internet connection broke as it was downloading, or it somehow got corrupted). Just download that file again.
Thanks alot! I will try that right now.
ah after re-downloading the data its 266407548, the same as for R1. Thanks alot for solving this for me!
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days
Describe the issue
When I was using kb-python to produce an index and t2g file and to align my reads to this index, then I find that I only processed 3.5 million reads, whereas the tutorial I am following processes 66 million reads, to get the data i've used: https://drive.google.com/drive/folders/1DbLRO4kv-y3W06adFR26RdSaDPmfB4UA
Ive used both reference genome and cDNA of v108 and v111, but with both I get this result.
I am using kb-python (v. 0.28.2) through Ubuntu, wls2, on a Windows11 machine.
What is the exact command that was run? To make the index/t2g:
To do read alignment:
Command output (with
--verbose
flag) No errors, the run worked fine, it just did not process all of the reads (which I find out when checking quality control stats in R):after running kb-ref:
after running kb-count:
when my supervisor does these exact steps, with the same data and also reference genome v111 (on kb-python v 0.28.2) then she does get 66 million reads processed. Do you have any idea why I am not able to process as much reads on my console while performing the same steps, on the same data, with the same version?