kaizhang / Taiji

This project has been moved to:
https://github.com/Taiji-pipeline/Taiji
9 stars 3 forks source link

Using tags : ["gene quantification"] tries to cast gene name to double #5

Closed npklein closed 7 years ago

npklein commented 7 years ago

I'm trying to use gene quantification data (using https://gist.github.com/npklein/04d3d7f46d4ac683827aadf315be49ff as input.yml) but I get the following error

[LOG][07-11 11:22] RNA_average: running... [WARN][07-11 11:22] RNA_average: Failed! [ERROR][07-11 11:22] "RNA_average" failed. The error was: readDouble: Fail to cast ByteString to Double:"ENSG00000000003" CallStack (from HasCallStack): error, called at src/Bio/Utils/Misc.hs:24:14 in bioinformatics-toolkit-0.3.2-6jGTx2VGtZyEsm7mUTIiFH:Bio.Utils.Misc.

It seems that it tries to convert the gene name to Double instead of the quantification. Opening the expression file with cat -vet expression.txt | head -n5 (so tab reads as ^I and end of line as $) I get

ENSG00000000003^I0.668539290747883$ ENSG00000000005^I4.23544092728736$ ENSG00000000419^I8.67727523424596$ ENSG00000000457^I10.0857885518317$

This seems to be according to specs from the example_input.yml comment

Gene1 \<TAB> 12 Gene2 \<TAB> 20

Incase it expects counts instead of normalized data I also tried to input ints instead of the floats, but got same error.

Could you upload an example quantification file so that I can check if I have the right format?

kaizhang commented 7 years ago

@npklein Since you have used Taiji to analyze RNA-seq data before, you can just look at "outputdir/RNA-seq/*_TPM_by_names.tsv" for examples. You should not use gene id, but this is not the reason for the issue. Example:

STPG1   413.00
NIPAL3  3495.00
LAS1L   1133.00
ENPP4   292.00
SEMA3F  1186.00
CFTR    0.00
ANKIB1  6129.00
CYP51A1 4533.98
KRIT1   1442.31

What happen if you grep ENSG00000000003 your_input.txt?

npklein commented 7 years ago

@kaizhang I didn't finish running it with the fastq files because it was taking too long, so decided to use the quantification instead, which is why I didn't have example data in outputdir/RNA-seq

I removed all my output, remade input files and did a fresh run. Now it doesn't crash so somewhere I had the wrong input, sorry for the trouble. Still have the problem that the Rank file is empty, but I will first change the gene IDs to gene names and see if that works.

Thanks for the help.