pachterlab / kb_python

A wrapper for the kallisto | bustools workflow for single-cell RNA-seq pre-processing
https://www.kallistobus.tools/
BSD 2-Clause "Simplified" License
141 stars 23 forks source link

Custom Index with a new sequence #226

Closed lukais-iohan closed 6 months ago

lukais-iohan commented 7 months ago

Hi,

I've got some fastq files from a human DropSeq experiment with single nuclei, and I'm trying to put together an index using a sequence from an mRNAi resistance gene. The idea is to check if we've picked the right cells for this experiment. I've been looking around for a tutorial that explains how to make this custom index, but no luck so far. Any advice on how to go about it would be much appreciated."

Best,

Lukas Iohan

Yenaled commented 7 months ago

This is not really kallisto-specific -- all tools require a genome FASTA and GTF. The typical way to go is to create a fake chromosome (with your custom sequence of interest) in the genome FASTA and then edit the GTF to annotate the fake chromosome. For that, you'll have to learn how the GTF file structure works. For any further questions on this, I recommend posting on a bioinformatics support forum.

Nonetheless, the easiest route in kb-python specifically (without messing with GTF), is to run kb-python to create a typical index. When you run kb ref, kb-python outputs a FASTA file (with output file you specify in the -f1 option in kb ref). Take that FASTA file, and then add your custom sequence name+sequence to the file (let's call it f1_modified.fasta). Then run kallisto index -i name_of_new_index.idx f1_modified.fasta -- the only additional thing you should do is the modify your transcripts-to-gene mapping file (the one you supply in the -g option in kb ref) and add a row containing the sequence name to the first two columns (I'm going to assume there's only one transcript, so use the sequence name to represent both the transcript name and the gene name.)

github-actions[bot] commented 6 months ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days