paulranum11 / kallisto_pseudo_to_expressionMatrix

Convert the output of kallisto "pseudo" to a expression matrix
0 stars 1 forks source link

Kallisto pseudo output #1

Open LeoColmet-Daage opened 5 years ago

LeoColmet-Daage commented 5 years ago

Hi,

I'm analyzing single-cell data (from Split-seq) and was able to run kallisto pseudo on a batch file with all the fastq from each of my cells.

I was hoping to convert the output of kallisto pseudo to a matrix format that Seurat can understand and found your script.

My issues is that the output of kallisto (0.46.0) pseudo I'm getting is not a .tsv file but a .mtx file.

here are the 4 outputs from kallisto matrix.cells matrix.ec matrix.tcc.mtx run_info.json

the matrix.tcc.mtx file looks like this %%MatrixMarket matrix coordinate real general 15323 715641 720882 1 30872 1 1 96552 1 1 100945 1 1 239734 1 1 292531 1 1 329903 1 1 345621 2 1 396041 1

First I'm not sure that this is suppose to be the correct output from kallisto but it has run without any error.

Second, do you think there is a way to convert this format to tsv because I've tried running your script but I get the error Loading index fasta.. Loading input matrix.. Traceback (most recent call last): File "/home/lcolmetdaage/kallisto_pseudo_to_expressionMatrix/prep_TCC_matrix.py", line 83, in matrixTSV_List.append(int(transcriptNum)) ValueError: invalid literal for int() with base 10: '%%MatrixMarket matrix coordinate real general\n'

Many thanks for your help

paulranum11 commented 5 years ago

Hi Leo,

It appears that you are using more up to date version of kallisto than I was when I implemented this script. This should work as written for kallisto 0.44.0. which i believe should output the matrix.tsv format. Switching kallisto versions might be the easiest way to solve this problem until i can determine exactly what the differences are between the matrix.tcc.mtx and matrix.tsv formats.

Because the formats look fairly similar it may be possible to just use the column 2 of the matrix.tcc.mtx instead of column 1 of the matrix.tsv

Will you test this solution on your data for me?

STEP1: remove any header lines from the matrix.tcc.mtx

STEP2: Please edit line #82 in the prep_TCC_matrix.py file. By default it should read transcriptNum = split1[0] please modify to transcriptNum = split1[1].

Let me know if this solves your problem.

Thanks,

LeoColmet-Daage commented 5 years ago

Hi Paul,

thanks a lot for your answer. I just realize you are the one that wrote also the SPlit-seq demultiplexing script that I'm partially using to analyse my data, so thanks also for this one.

I've tried to run kallisto 0.44.0 but end up with a "segmentation fault" error so I was force to move to 0.46.0 which run fine on my data.

looking at kallisto github someone ask about the .mtx files instead of .tsv https://github.com/pachterlab/kallisto/issues/204 So I'm still not sure if this is suppose to be the standard output from kallisto above 0.44.0.

I'm trying the solution you suggest. I'm just not sure if the second line of matrix.tcc.mtx is suppose to be a header lines since it looks different from the other lines ?

For the moment I've remove only the first line and it's seems to run fine after editing prep_TCC_matrix.py I will inform you when the job is finished.

On an other topic since you wrote also the SPlit-seq demultiplexing script I might also ask you some questions about that. Do you mind if I email you ?

best,

Léo

paulranum11 commented 5 years ago

Hi Leo,

Happy to discuss more. Feel free to contact me at *****. If kallisto isn't working well for you, you could try an alternative alignment strategy using STAR but the details will depend on what kind of a system or cluster you are working on.

Thanks,

Evi-050 commented 4 years ago

Hello,

As new to scRNAseq analysis I have a question my own... I have smartseq.2 data and managed to run the kallisto pseudo and quant. I also wanted to generate the matrix to work in seurat. My data are generated with the 46.2 version of kallisto so the outputs are as well matrix.cells matrix.ec matrix.tcc.mtx run_info.json files.

If you managed to creat a pipeline for mtx files, could you share it with me? If not, any alternative is more than welcome!

supernewtoscrnaseq

best, Evi