instituteofcancerresearch / SOPRANO

SelectiOn in PRotein ANnotated regiOns. Adapted dN/dS based method to detect selection in specific protein regions
https://instituteofcancerresearch.github.io/SOPRANO/
GNU General Public License v3.0
3 stars 0 forks source link

Annotate VCF files - soprano-app #81

Open faizjab opened 8 months ago

faizjab commented 8 months ago

Hi,

I am trying to run step 2 on soprano-app to annotate my VCF file. However, I get this error: Screenshot 2024-03-18 at 21 35 47

As I am trying to create the vcf.anno file through step 2, I am unsure how I am supposed to have the file already created? Any help would be much appreciated.

faizjab commented 8 months ago

I then created an empty test.vcf.anno file to get around this and annotation was succesfull. But it created an empty file.

Screenshot 2024-03-20 at 12 53 04
faizjab commented 5 months ago

@rachelicr

rachelicr commented 5 months ago

@faizjab if this is a support request from the ICR could you mail schelpdesk@icr.ac.uk so that we can filter it though to the correct place?

rachelicr commented 5 months ago

@faizjab This error is because you haven't given a name in the box where it says "Choose a name for the annotated output". This is not optional - this should be clearer, and should default to something temporary.

rachelicr commented 5 months ago

I will add making this clearer as something in our todo list,but meanwhile you should be able to proceed by simply giving a name.

faizjab commented 5 months ago

@rachelicr. Thank you for the reply. Even when putting in a name in the box for "Choose a name for the annotated output" I still get the same error.

However I ended up using the VEP tool on the Ensembl website to get around this.

The error I get now when running SOPRANO is this: Screenshot 2024-06-17 at 11 17 01

rachelicr commented 5 months ago

If you attach the data you are using I can have a look?

faizjab commented 5 months ago

tcga_full_selected.annotated.zip

This is a VEP of TCGA LUSC 2018.

rachelicr commented 5 months ago

What about the original vcf you say fails in the first step? test.vcf

faizjab commented 5 months ago

lusc_tcga_2018.vcf.zip Here's the original vcf file for TCGA LUSC 2018, thank you for helping with this

rachelicr commented 5 months ago

The problem with the original file is that the annotator expects some header data, specifically in complains:

Error in vcfR::read.vcfR(vcf_path, verbose = FALSE) : 
  File: /home/ralcraft/dev/shiny-proxy-developed/SOPRANO/data/test_data/faizjab_jun24/lusc_tacg_2018.vcf does not appear to be a VCF file.
  First line of file:
 /home/ralcraft/dev/shiny-proxy-developed/SOPRANO/data/test_data/faizjab_jun24/lusc_tacg_2018.vcf 
  Should begin with:
##fileformat=VCFv 

A file that works has the first 2 rows as:

fileformat=VCFv4.1

fileDate=20240527

faizjab commented 5 months ago

Hi,

Thank you for this.

Even when I add the first two rows as above, I get the same error. Screenshot 2024-06-17 at 13 03 04

When using Ensembl's VEP annotation function I am able to annotate the VCF file and run SOPRANO.

However the error I get now says this. I am using the allhlabinders_exprmean1.IEDBpeps.bed file

Screenshot 2024-06-17 at 13 06 04

rachelicr commented 5 months ago

It needs to be in fully specified vcf format: https://samtools.github.io/hts-specs/VCFv4.2.pdf

I suspect the next step isn;t going to work if the original file doesn;t work so it is this step that needs to be fixed.

faizjab commented 5 months ago

Screenshot 2024-06-17 at 13 32 00

I have formatted the data into a fully specified vcf format but still get the same error at step 2

Screenshot 2024-06-17 at 13 39 44

rachelicr commented 5 months ago

Can attach the fully specified file?

faizjab commented 5 months ago

output.vcf.zip

Here's the vcf file

rachelicr commented 5 months ago

That file has worked for me both from the command line and the web-app: image

faizjab commented 5 months ago

thank you!

faizjab commented 5 months ago

@rachelicr when using the soprano web app I am trying to get the GRCh37 reference genome. I have this already downloaded on my computer but I can't seem to link to it. Would you be able to advise? Screenshot 2024-06-19 at 12 23 20

rachelicr commented 5 months ago

If you are using the application, you should be able to select GRCh37 form the drop down?

The web application has this functionality restricted to only those we have predownloaded to reduce space. If you want to link to your own genome cache then you need to run the application locally either in docker or from the clone repo, or from the command line.

https://instituteofcancerresearch.github.io/SOPRANO/installation/