Open lukaas33 opened 2 months ago
I have now ran the same command in the latest vatools Docker image and am getting a new error:
> vcf-expression-annotator -o /shared_dir/temp.annotated.vcf -s sample /shared_dir/temp.vep.vcf /shared_dir/temp.abundance.tsv stringtie transcript
Traceback (most recent call last):
File "parsers.pyx", line 1160, in pandas._libs.parsers.TextReader._convert_tokens
TypeError: Cannot cast array data from dtype('O') to dtype('int64') according to the rule 'safe'
The start of my stringtie output:
Gene ID Gene Name Reference Strand Start End Coverage FPKM TPM
ENSG00000282881 TMEM275 1 - 46532166 46543969 0.0 0.0 0.0
ENSG00000201405 Y_RNA 1 + 23370254 23370346 0.0 0.0 0.0
ENSG00000143774 GUK1 1 + 228139962 228148984 0.0 0.0 0.0
ENSG00000288775 . 1 - 159776325 159779383 0.0 0.0 0.0
ENSG00000239887 C1orf226 1 + 162378841 162386812 0.0 0.0 0.0
ENSG00000200575 RNU6-414P 1 + 61816419 61816522 0.0 0.0 0.0
ENSG00000251785 RNA5SP20 1 + 77614869 77614952 0.0 0.0 0.0
ENSG00000237872 POU5F1P4 1 + 155433178 155434262 0.0 0.0 0.0
For the first error, I suspect that one of the dependencies you have installed is incompatible with gtfparse. Here are the versions in our Docker images:
gtfparse==1.3.0
numpy==1.26.1
pandas==2.1.1
pysam==0.22.0
python-dateutil==2.8.2
pytz==2023.3.post1
six==1.16.0
testfixtures==7.2.0
tzdata==2023.3
vatools==5.1.0
vcfpy==0.12.3
Try downgrading your dependencies to match these versions.
I'm unable to reproduce the error in your second comment with the stringtie output provided. Can you please attach all of your input files (VCF and full stringtie TSV)?
Hi @susannasiebert, can I send these to you privately to maintain privacy of this data?
Yes, absolutely. My email is susanna.kiwala@wustl.edu
My apologies for the belated replies. After investigating your files, it looks like you are trying to use a gene abundance file in transcript mode. If you switch your command to
vcf-expression-annotator -o /shared_dir/temp.annotated.vcf -s sample /shared_dir/temp.vep.vcf /shared_dir/temp.abundance.tsv stringtie gene
It works without problems.
The transcript abundance file from stringtie is in gtf format while the gene abundance file is in tsv format. Mixing them up leads to unexpected errors like the one you are seeing. I've added issue #78 to add better error handling for this case.
Ah, and sorry for my ignorance. But wouldn't I need to add both of them? So the tsv and gtf files.
Is there a way to do this with one command or will this always involve two steps?
Or does the transcript level expression always contain more detail?
You would need to run this as two steps, unfortunately.
Ah so it is recommended to also add gene expression level besides transcript expression level?
Yes, the gene expression levels are, for example, used during tiering in the aggregated report.
Ah in that case it may be nice to have a feature to add both files. Or to support command line piping so that it can be done in one line and creates only one file.
I am getting the following error from the gtf library when trying to annotate my vcf with stringtie output.
Version 2.0.1 with Python 3.6
I have also reported this issue at the gtf page: https://github.com/openvax/gtfparse/issues/49