Closed wjyzidane closed 6 years ago
Hello @wjyzidane,
I believe you can solve the problem after you run below command line:
pip3 install pandas --user
Please let me know it works.
Thank you,
Hyun-Hwan Jeong
Hi Hyun-Hwan,
It works! Thanks a lot!
I am reading the results files from the SalmonTE and I wonder if there is a manual or some detailed explanation about each output file as well as some input options like "--exprtype=exprtype".
Because I am a little bit confused that why NumReads in quant.sf is not the integer. I think the TPM from the quant.sf is the same number in EXPR.csv and that is what we need for the quantification of each repeat, right?
Thanks!
Hi Hyun-Hwan,
I found there are 688 repeat categories included in the EXPR.csv for human genome but actually there are 1396 repeat categories for hg19 repeatmasker. So I wonder why there is such big difference. Thanks!
Jingyi
Hello Jinygyi,
With --exprtype
option, you can put two different type of values - TPM (if you put the value as TPM
or does not set the parameter in the command), or NumReads
counts (if you put the value as count
). There is the reason why a NumReads
is not an integer number is that this number is from the estimation (or approximation), but it is fine you can use the number after rounding. If you only want to see the abundance of repeat elements than I would like to recommend you to use TPM
option, but if you want to do differential expression analysis with DESeq then please use count
option. If it is not clear to you, and you want to have a better answer then, please tell your configuration of the experiment.
I have collected TE elements from Repbase, not RepeatMasker, and had a cleaning phase of redundant elements, so we are able to have 687 elements. Please see below paragraph which explains the process. I quoted from my paper of SalmonTE:
To build the index library for the quasi-mapping, SalmonTE takes the FASTA file of cDNA sequences from TE databases such as Repbase (version 22.06)[23] In the current version, the index files for Homo sapiens and Drosophila melanogaster are available. We reasoned that it is hard to estimate TEs which replicate without an RNA intermediate from RNA-seq sample. Therefore, we excluded the following elements: simple repeats and multi-copy genes, and DNA transposable. After collecting the cDNA sequences, we manually curated clades of each TE based on the repeat class annotation from Repbase. As a result, the generated TE library index database contains 687 TEs for Homo sapiens and 163 TEs for Drosophila melanogaster.
Thank you,
Hwan
It makes sense now. Thank you so much!
I run the SalmonTE like this:
./SalmonTE.py quant --reference=hs ./example/CTRL_1_R1.fastq
and get error as below:
But it seems the quantification files are generated here: SalmonTE_output/CTRL_1_R1/quant.sf
but I am not sure if it is right. Appreciate your help. Thanks!