Closed mariusmessemaker closed 4 years ago
A few things:
-m 100G
for the memory, this is how the memory string is readkb-python
soon that will support this structure.For clarity this is the structure I am referring to: R1: Biological Read R2: Cell BC1 (1-8bp) R3: Index (1-8bp) UMI (9-14bp) R4: Cell BC2 (1-8bp)
Thank you for your quick response! I will try to rerun with -m 100G
; I will let you know whether that solves the issue. Does the newest version of kallisto have a inDropsv3 structure that also accepts a R3 library index file? If so, I will try to run kallisto. But it should be possible right to specify a new technology string with the library index of inDrops in R3 as a BC because in essence a library index is just a BC (that together with BC1 and BC2 specifies unique cells)?
In addition, should the UMI in your inDrops structure not be in R4? I use the following inDrops version 3 structure, which is the same as used in: https://github.com/indrops/indrops (I checked the reads manually): R1: Biological read R2: Cell BC1 (1-8 bp) R3: Index (1-8 bp) R4: Cell BC2 (1-8 bp) and UMI (9-14 bp).
Again, thank you so much for creating kallisto bus, and kb-python.
The index file is only necessary if you wish to demultiplex samples that were pooled on the same lane, using the samplesheet.csv file that you create (if you used Illumina short read sequencing), see Illumina documentation. kallisto does not use the sample index.
Small side note, I made an error in the comment above, the UMI is in R4 and the structure is:
R1: Biological read
R2: Cell BC1 (1-8 bp)
R3: Index (1-8 bp)
R4: Cell BC2 (1-8 bp) and UMI (9-14 bp).
To process your reads lets look at main.cpp
in the kallisto repo. We see the following lines:
} else if (opt.technology == "INDROPSV3") {
busopt.nfiles = 3;
busopt.seq.push_back(BUSOptionSubstr(2,0,0));
busopt.umi = BUSOptionSubstr(1,8,14);
busopt.bc.push_back(BUSOptionSubstr(0,0,8));
busopt.bc.push_back(BUSOptionSubstr(1,0,8));
In plain english: kallisto expects 3
files. Given how you have defined what R1,R2,R3,R4 mean, we note the that first half of the cell barcode comes from R2, the second half of the cell barcode comes from R4, the UMI comes from R4 and the biological read is in R1. So the command would be:
kallisto bus -i index.idx -o ./output -x inDropsv3 R2.fastq.gz R4.fastq.gz R1.fastq.gz
Where R2 is the 0th
file, R4 is the 1st
file and R1 is the 2nd
file (0-indexed).
This works with the current release of kallisto and will be added to kb-python
soon.
Thank you for your reaction. Yes, I understood that I could not use technology = "INDROPSV3" because this kb technology specification expects 3 files that together contain BC1, BC2, UMI, and Biological read. Therefore, I used the opportunity of kb to specifiy a new technology myself that has 3 BC of which one BC is the library index because in essence a library index is just a BC that together with BC1 and BC2 specifies an unique cell. The issue that I raised here is that the specification of a new technology with 3 BCs, a UMI, and a Biological read does not work, while it should be possible to do this according to the kallisto bus documentation? I also tried to run with -m 100G
and I get a different error <Signals.SIGKILL: 9>
instead of <Signals.SIGSEGV: 11>
but still 0 reads pseudo aligned and an empty bus file (which I think causes the error in the bustools sort command).
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days
Thank you so much for your great tool, I really like the possibility of specifying new technologies, RNA-velocity, and the upcoming feature barcoding option!
Describe the issue I tried to specify a new technology with a library index BC in R3 from 1-8 bp, a cell BC in R2 from 1-8 bp, a cell BC in R4 from 1-8 bp, a UMI in R3 from 9-14 bp, and the biological read in R1. However, I get 0 pseudo alignments (I can map the reads successfully with another mapper), and an empty bus file (see below for file sizes in kb count output directory).
What is the exact command that was run?
Command output of kb count (with
--verbose
flag)The command output of kb ref:
The file sizes in the kb count output directory:
And the file sizes in the kb ref output directory:
The versions that I used: python
3.7.4
, kb-python0.24.4
, kallisto0.46.1
, and bustools0.39.3
I run the command on a hpc cluster and allocated 12 cores and 100G memory.
Let me know if you need more information such as the fasta files. Thank you so much in advance for your help!