lgmgeo / AnnotSV

Annotation and Ranking of Structural Variation
GNU General Public License v3.0
219 stars 34 forks source link

-annotationsDir flag & manually update of Exomiser #16

Closed Stikus closed 4 years ago

Stikus commented 4 years ago

Hello.

We are trying to use external reference directories for your tool (we are using Docker for tools and we down want to store 20GB reference in it), tool have support for it - -annotationsDir flag. But for now all annotations are stored in one place - both own AnnotSV and Exomiser (for us). Problem will emerge in future AnnotSV release - for now (2.3.2 AnnotSV) annotations are stored in 2.3.2/Annotations_Human and 2.3.2/Annotations_Exomiser/1902/1902_hg19. Exomiser update will not cause problem: 2.3.2/Annotations_Exomiser/2003/2003_hg19. AnnotSV version update will cause all other annotations to be placed into directories like 2.3.3/Annotations_Exomiser/2003/2003_hg19 for example, but files can be unchanged.

Implementing 2.3.2 directory (or something pointing current release version) into Annotations_Human will fix this problem.

lgmgeo commented 4 years ago

Hi,

Thank you for your interest in using AnnotSV.

Annotations are updated and formated specifically for each significant release of AnnotSV (once or twice a year). Next annotations update will be available for v2.4 (not for v2.3.3, it would be too much time consuming for me to do an update as soon as a source annotation is updated). In other words, you have the same annotation data for different realease such as v2.3.2 and v2.3.3.

Regarding your request, I'm not sure to well understand. Could you explain in more details why the use of: -annotationsDir AnnotSV/2.3.2/ or -annotationsDir AnnotSV/2.3.3/ will not work with the use of Docker?

Best regards, Véronique

Stikus commented 4 years ago

Thank you for answer.

Maybe I misunderstood your use of annotation sources. Can we manually update Exomiser from 1902 in Makefile to latest 2003 simply by using new data from here? For now annotations are stored in fixed structure and I assume that we can update them independently - am I wrong?

Or we should use strictly 1902 build of Exomiser data like stated in Makefile?

For now, I'm planning to manually download full 20 GB of Exomiser 2003 and use them. Only after implementation of this feature I realised that you're downloading only 2 GB 1902_phenotype.zip and not full 20 GB 1909_hg19.zip. Moreover - according to this issue I don't need to download both hg19 and hg38 data - is it correct?

Sorry for some confusion, let's create a list of questions:

  1. Can we use full 20 GB https://data.monarchinitiative.org/exomiser/data/1902_hg19.zip instead of your https://www.lbgi.fr/~geoffroy/Annotations/1902_hg19.tar.gz ? Should we (I assume 'no' here)?

  2. If we want to update from 1902 to 2003 - can we do it ourselves? By using new files from https://data.monarchinitiative.org/exomiser/data ?

  3. When you release AnnotSV 2.4 old annotations will work or not (like Exomiser 1902 data)?

  4. And finally - how to make Exomiser step work? Before I have its data I got WARNING: No Exomiser annotations available in /ref/AnnotSV/2.3.2/Annotations_Exomiser/. Now nothing happened but output files are same and I don't see any ..running Exomiser messages in log like here.

lgmgeo commented 4 years ago
Can we manually update Exomiser from 1902 in Makefile to latest 2003 simply by using new data from here?

Yes you can. You just need to keep the same Exomiser files and hierarchy that in AnnotSV.

For now annotations are stored in fixed structure and I assume that we can update them independently - am I wrong?
If we want to update from 1902 to 2003 - can we do it ourselves? By using new files from https://data.monarchinitiative.org/exomiser/data ?

Yes you can. But I didn't check the latest 2003 data from Exomiser yet. It should work if the format is unchanged. Else, please contact me by email (veronique.geoffroy@inserm.fr) for debugging.

For now, I'm planning to manually download full 20 GB of Exomiser 2003 and use them. Only after implementation of this feature I realised that you're downloading only 2 GB 1902_phenotype.zip and not full 20 GB 1909_hg19.zip. Moreover - according to this issue I don't need to download both hg19 and hg38 data - is it correct?

Absolutely correct, this module takes use of Exomiser (Smedley et al., 2015) and HPO (Köhler et al., 2019) to score genes overlapped with a SV on biological relevance to the individual phenotype. No link with the genome build version.

Can we use full 20 GB https://data.monarchinitiative.org/exomiser/data/1902_hg19.zip instead of your https://www.lbgi.fr/~geoffroy/Annotations/1902_hg19.tar.gz ? Should we (I assume 'no' here)?

No, the full 20 GB can't be used. Only some of these zipped files (9 KB) are needed.

When you release AnnotSV 2.4 old annotations will work or not (like Exomiser 1902 data)?

Theoritically, it should work. Except if the Exomiser format changed. I don't think so.

And finally - how to make Exomiser step work? Before I have its data I got WARNING: No Exomiser annotations available in /ref/AnnotSV/2.3.2/Annotations_Exomiser/. Now nothing happened but output files are same and I don't see any ..running Exomiser messages in log like here.

Do you run the latest version of AnnotSV? (2.3.2) If yes, can you please send me by email the result of the following command lines: echo $ANNOTSV ls $ANNOTSV/share/AnnotSV/Annotations_Exomiser/ ls $ANNOTSV/share/AnnotSV/Annotations_Exomiser/1902/*

Stikus commented 4 years ago

Yes, I've found that this block https://github.com/lgmgeo/AnnotSV/blob/master/Makefile#L154 is missing in my installation, thx for pointing out.

I'll report my results tomorrow.

Can you tell me share/AnnotSV/jar/ content should be near Annotations_Exomiser or $ANNOTSV/share/?

For now, I have:

root@970129676c3b:/outputs# echo $ANNOTSV
/soft/AnnotSV-2.3.2                      

root@970129676c3b:/outputs# ls -la $ANNOTSV       
total 12                                          
drwxr-xr-x. 5 root root   57 Jun 11 16:14 .       
drwxr-xr-x. 1 root root   27 Jun 11 16:14 ..      
-rwxr-xr-x. 1 root root 8843 Jun 11 15:32 Makefile
drwxr-xr-x. 2 root root   21 Jun 11 16:14 bin     
drwxr-xr-x. 3 root root   21 Jun 11 16:14 etc     
drwxr-xr-x. 5 root root   43 Jun 11 16:14 share   

root@970129676c3b:/outputs# ls -la $ANNOTSV/share/
total 0                                           
drwxr-xr-x. 5 root root 43 Jun 11 16:14 .         
drwxr-xr-x. 5 root root 57 Jun 11 16:14 ..        
drwxr-xr-x. 3 root root 21 Jun 11 16:14 bash      
drwxr-xr-x. 3 root root 21 Jun 11 16:14 doc       
drwxr-xr-x. 3 root root 21 Jun 11 16:14 tcl8.6    

root@970129676c3b:/outputs# ls -la /ref/AnnotSV/2.3.2/        
total 0                                                       
drwxr-xr-x. 4 997 root  59 Jun 15 14:56 .                     
drwxr-xr-x. 3 997 root  19 Jun 11 16:14 ..                    
drwxr-xr-x. 4 997 root  30 Jun 15 19:13 Annotations_Exomiser  
drwxr-xr-x. 8 997 root 127 Dec 20 14:52 Annotations_Human     
lgmgeo commented 4 years ago

Ok, so you use -annotationsDir /ref/AnnotSV/2.3.2/, right? So the share/AnnotSV/jar/ content should be in /ref/AnnotSV/2.3.2/

On my installation, without using the -annotationsDir option, I have:

ls -la $ANNOTSV
total 32
drwxr-xr-x 5 geoffroy lgm 4096 Jun 15 20:14 .
drwxr-xr-x 3 geoffroy lgm 4096 Jun 15 20:13 ..
drwxr-xr-x 2 geoffroy lgm 4096 Jun 15 20:14 bin
drwxr-xr-x 3 geoffroy lgm 4096 Jun 15 20:14 etc
-rwxr-xr-x 1 geoffroy lgm 8843 Jun 15 20:13 Makefile
drwxr-xr-x 6 geoffroy lgm 4096 Jun 15 20:17 share

 ls -la $ANNOTSV/share/
total 24
drwxr-xr-x 6 geoffroy lgm 4096 Jun 15 20:17 .
drwxr-xr-x 5 geoffroy lgm 4096 Jun 15 20:14 ..
drwxr-xr-x 5 geoffroy lgm 4096 Jun 15 20:18 AnnotSV
drwxr-xr-x 3 geoffroy lgm 4096 Jun 15 20:14 bash
drwxr-xr-x 3 geoffroy lgm 4096 Jun 15 20:14 doc
drwxr-xr-x 3 geoffroy lgm 4096 Jun 15 20:14 tcl8.6

ls -la $ANNOTSV/share/AnnotSV/
total 20
drwxr-xr-x 5 geoffroy lgm 4096 Jun 15 20:18 .
drwxr-xr-x 6 geoffroy lgm 4096 Jun 15 20:17 ..
drwxr-xr-x 3 geoffroy lgm 4096 Jun 15 20:17 Annotations_Exomiser
drwxr-xr-x 8 geoffroy lgm 4096 Dec 20 12:52 Annotations_Human
drwxr-xr-x 2 geoffroy lgm 4096 Jun 15 20:18 jar

Does it help you to solve the bug?

Stikus commented 4 years ago

Ok, so you use-annotationsDir /ref/AnnotSV/2.3.2/, right?

Right. Thanks for answer, I'll check. We have serious ISP issues for last week - so even downloading full zip of AnnotSV is difficult for me now - 90% of the time I get:

Archive:  /soft/AnnotSV-4fea16c6f0dcbaedd19ced58c34d22becbcf2b6c.zip
  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.
unzip:  cannot find zipfile directory in one of /soft/AnnotSV-4fea16c6f0dcbaedd19ced58c34d22becbcf2b6c.zip or
        /soft/AnnotSV-4fea16c6f0dcbaedd19ced58c34d22becbcf2b6c.zip.zip, and cannot find /soft/AnnotSV-4fea16c6f0dcbaedd19ced58c34d22becbcf2b6c.zip.ZIP, period.

But looks like your solution is correct.

And about updating Exomiser - there are 2 different problems here:

  1. We can use 2003_phenotype.zip instead of 1902_phenotype.zip - this should be trivial (if 1902 is not hardcoded somewhere)

  2. We need updated version of your 1902_hg19.tar.gz for 2003 - correct? Or we don't need to update? If we do - can we do it ourselves or only you can do if with next major release?

lgmgeo commented 4 years ago
We can use 2003_phenotype.zip instead of 1902_phenotype.zip - this should be trivial (if 1902 is not hardcoded somewhere)

Absolutely. It also requires an update of the "etc/AnnotSV/application.properties" file.

We need updated version of your 1902_hg19.tar.gz for 2003 - correct? Or we don't need to update? If we do - can we do it ourselves or only you can do if with next major release?

The Exomiser jar file has a dependency on the genome data. With the help of Jules JACOBSEN (Exomiser developer), AnnotSV hacks it by simply including the 1902_hg19.tar.gz. So, in theory, we only need to change the name of these files.

Just to let you know, an update is planned for this summer. If you can wait for that, it will be easier...

Stikus commented 4 years ago

We absolutely can wait, it is just curiosity.

Absolutely. It also requires an update of the "etc/AnnotSV/application.properties" file.

https://github.com/lgmgeo/AnnotSV/blob/master/etc/AnnotSV/application.properties#L29 - this line I assume?

Can we use 2003 phenotype with 1902 hg19? Should we get any benefits? Or just use 1902 until you update tool - what do you think?

Thanks for fast answers and help :)

Stikus commented 4 years ago

Looks like something still not working:

  Command: '/soft/AnnotSV-2.3.2/bin/AnnotSV -annotationsDir /ref/AnnotSV/2.3.2 -genomeBuild GRCh37 -SVinputFile /inputs/som_candidateSV.vcf -outputFile /outputs/test_som.annotsv.tsv'.
  PID=156 (last job)
AnnotSV 2.3.2

Copyright (C) 2017-2019 GEOFFROY Veronique

Please feel free to contact me for any suggestions or bug reports
email: veronique.geoffroy@inserm.fr

Tcl/Tk version: 8.6

Application name used (defined with the "ANNOTSV" environment variable):
/soft/AnnotSV-2.3.2

...downloading the configuration data (June 16 2020 - 00:59)
    ...configuration data by default
    ...configuration data from /soft/AnnotSV-2.3.2/etc/AnnotSV/configfile
    ...configuration data given in arguments
    ...checking configuration data and files

WARNING: No GeneHancer annotations available.
(Please, see in the README file how to add these annotations. Users need to contact the GeneCards team.)

    ******************************************
    AnnotSV has been run with these arguments:
    ******************************************
    -SVinputFile /inputs/som_candidateSV.vcf
    -SVinputInfo 1
    -SVminSize 50
    -annotationsDir /ref/AnnotSV/2.3.2
    -bedtools bedtools
    -candidateGenesFiltering no
    -genomeBuild GRCh37
    -metrics us
    -minTotalNumber 500
    -organism Human
    -outputDir /outputs
    -outputFile test_som.annotsv.tsv
    -overlap 70
    -overwrite yes
    -promoterSize 500
    -rankFiltering 1 2 3 4 5
    -rankOutput no
    -reciprocal no
    -snvIndelPASS 0
    -svtBEDcol -1
    ******************************************

...listing of the annotations to realized (June 16 2020 - 00:59)
    ...refGene annotation
    (with /ref/AnnotSV/2.3.2/Annotations_Human/RefGene/GRCh37/refGene.sorted.bed)
    ...Genes-based annotations
        ...20181211_ACMG.tsv
        (59 gene identifiers and 1 annotations columns: ACMG)
        ...20191219_DDG2P.tsv.gz
        (1982 gene identifiers and 5 annotations columns: DDD_status, DDD_mode, DDD_consequence, DDD_disease, DDD_pmids)
        ...20191219_HI.tsv.gz
        (19124 gene identifiers and 1 annotations columns: HI_DDDpercent)
        ...20191219_GeneIntolerance.pLI-Zscore.annotations.tsv.gz
        (18241 gene identifiers and 3 annotations columns: synZ_ExAC, misZ_ExAC, pLI_ExAC)
        ...20191219_ExAC.CNV-Zscore.annotations.tsv.gz
        (15673 gene identifiers and 3 annotations columns: delZ_ExAC, dupZ_ExAC, cnvZ_ExAC)
        ...20191216_OMIM-1-annotations.tsv.gz
        (14411 gene identifiers and 1 annotations columns: Mim Number)
        ...20191216_morbidGenesCandidates.tsv.gz
        (3136 gene identifiers and 1 annotations columns: morbidGenesCandidates)
        ...20191216_OMIM-2-annotations.tsv.gz
        (14411 gene identifiers and 2 annotations columns: Phenotypes, Inheritance)
        ...20191216_morbidGenes.tsv.gz
        (11249 gene identifiers and 1 annotations columns: morbidGenes)
        ...20191219_ClinGenAnnotations.tsv.gz
        (1392 gene identifiers and 2 annotations columns: HI_CGscore, TriS_CGscore)
    ...Annotations with features overlapping the SV
        ...DGV Gold Standard frequency annotation
        ...gnomAD frequency annotation
        ...DDD frequency annotation
        ...1000g frequency annotation
        ...Ira M. Hall's lab frequency annotation
    ...Annotations with features overlapped with the SV
        ...Promoters annotation
        ...dbVar_pathogenic_NR_SV annotation
        ...TAD annotation
    ...Breakpoints annotations
        ...GC content annotation
        ...Repeat annotation

...annotation in progress (June 16 2020 - 00:59)

...Output columns annotation:
    AnnotSV ID; SV chrom; SV start; SV end; SV length; SV type; ID; REF; ALT; QUAL; FILTER; INFO; AnnotSV type; Gene name; NM; CDS length; tx length; location; location2; intersectStart; intersectEnd; DGV_GAIN_IDs; DGV_GAIN_n_samples_with_SV; DGV_GAIN_n_samples_tested; DGV_GAIN_Frequency; DGV_LOSS_IDs; DGV_LOSS_n_samples_with_SV; DGV_LOSS_n_samples_tested; DGV_LOSS_Frequency; GD_ID; GD_AN; GD_N_HET; GD_N_HOMALT; GD_AF; GD_POPMAX_AF; GD_ID_others; DDD_SV; DDD_DUP_n_samples_with_SV; DDD_DUP_Frequency; DDD_DEL_n_samples_with_SV; DDD_DEL_Frequency; 1000g_event; 1000g_AF; 1000g_max_AF; IMH_ID; IMH_AF; IMH_ID_others; promoters; dbVar_event; dbVar_variant; dbVar_status; TADcoordinates; ENCODEexperiments; GCcontent_left; GCcontent_right; Repeats_coord_left; Repeats_type_left; Repeats_coord_right; Repeats_type_right; ACMG; DDD_status; DDD_mode; DDD_consequence; DDD_disease; DDD_pmids; HI_DDDpercent; synZ_ExAC; misZ_ExAC; pLI_ExAC; delZ_ExAC; dupZ_ExAC; cnvZ_ExAC; Mim Number; morbidGenesCandidates; Phenotypes; Inheritance; morbidGenes; HI_CGscore; TriS_CGscore; AnnotSV ranking

...AnnotSV is done with the analysis (June 16 2020 - 00:59)
root@27fe373fffff:/outputs# echo $ANNOTSV
/soft/AnnotSV-2.3.2                      

root@27fe373fffff:/outputs# ls -la $ANNOTSV        
total 12                                           
drwxr-xr-x. 5 root root   57 Jun 16 00:39 .        
drwxr-xr-x. 1 root root   27 Jun 16 00:39 ..       
-rwxr-xr-x. 1 root root 8843 Jun 11 22:10 Makefile 
drwxr-xr-x. 2 root root   21 Jun 16 00:39 bin      
drwxr-xr-x. 3 root root   21 Jun 16 00:39 etc      
drwxr-xr-x. 5 root root   43 Jun 16 00:39 share    

root@27fe373fffff:/outputs# ls -la $ANNOTSV/share/ 
total 0                                            
drwxr-xr-x. 5 root root 43 Jun 16 00:39 .          
drwxr-xr-x. 5 root root 57 Jun 16 00:39 ..         
drwxr-xr-x. 3 root root 21 Jun 16 00:39 bash       
drwxr-xr-x. 3 root root 21 Jun 16 00:39 doc        
drwxr-xr-x. 3 root root 21 Jun 16 00:39 tcl8.6     

root@27fe373fffff:/outputs# ls -la $ANNOTSV/etc/AnnotSV/         
total 8                                                          
drwxr-xr-x. 2 root root   54 Jun 16 00:39 .                      
drwxr-xr-x. 3 root root   21 Jun 16 00:39 ..                     
-rw-r--r--. 1 root root 1468 Jun 16 00:39 application.properties 
-rwxr-xr-x. 1 root root 2280 Jun 11 22:10 configfile             

root@27fe373fffff:/outputs# ls -la /ref/AnnotSV/2.3.2/        
total 0                                                       
drwxr-xr-x. 5 root root  70 Jun 16 00:53 .                    
drwxr-xr-x. 3 root root  19 Jun 15 23:22 ..                   
drwxr-xr-x. 3 root root  18 Jun 15 23:23 Annotations_Exomiser 
drwxr-xr-x. 8 3054 3002 127 Dec 20 14:52 Annotations_Human    
drwxrwxr-x. 2 root root  50 Jun 11 22:10 jar                  

root@27fe373fffff:/outputs# ls -la /ref/AnnotSV/2.3.2/Annotations_Exomiser/1902/1902_*   
/ref/AnnotSV/2.3.2/Annotations_Exomiser/1902/1902_hg19:                                  
total 114128                                                                             
drwxr-xr-x. 2 3054 3002       109 Dec 13  2019 .                                         
drwxr-xr-x. 4 root root        45 Jun 16 00:56 ..                                        
-rw-r--r--. 1 3054 3002     65536 Dec 13  2019 1902_hg19_genome.h2.db                    
-rw-r--r--. 1 3054 3002 116785848 Dec 13  2019 1902_hg19_transcripts_ensembl.ser         
-rw-r--r--. 1 3054 3002     12288 Dec 13  2019 1902_hg19_variants.mv.db                  

/ref/AnnotSV/2.3.2/Annotations_Exomiser/1902/1902_phenotype:                             
total 8619648                                                                            
drwxr-xr-x. 3 root root         71 Jun 16 00:54 .                                        
drwxr-xr-x. 4 root root         45 Jun 16 00:56 ..                                       
-rw-r-----. 1 root root 8016973824 Mar  6  2019 1902_phenotype.h2.db                     
drwxr-xr-x. 2 root root          6 Mar  6  2019 phenix                                   
-rwxr-x---. 1 root root  809545728 Mar  6  2019 rw_string_10.mv                          
lgmgeo commented 4 years ago

Absolutely. It also requires an update of the "etc/AnnotSV/application.properties" file.

https://github.com/lgmgeo/AnnotSV/blob/master/etc/AnnotSV/application.properties#L29 - this line I assume?

Yes, and the following one: https://github.com/lgmgeo/AnnotSV/blob/master/etc/AnnotSV/application.properties#L30

Looks like something still not working:

You didn't give HPO argument in your command line to describe the phenotype of your patient. Try something like:

/soft/AnnotSV-2.3.2/bin/AnnotSV -annotationsDir /ref/AnnotSV/2.3.2 -genomeBuild GRCh37 -SVinputFile /inputs/som_candidateSV.vcf -outputFile /outputs/test_som.annotsv.tsv -hpo "HP:0001156,HP:0001363,HP:0011304,HP:0010055"
serge2016 commented 4 years ago

Yes, and the following one: https://github.com/lgmgeo/AnnotSV/blob/master/etc/AnnotSV/application.properties#L30

Of course!

You didn't give HPO argument in your command line to describe the phenotype of your patient. Try something like:

Is it possible not to specify this? In my case we usually do not know the phenotype of the patient and we a looking for the way to annotate the structural variants in details.

lgmgeo commented 4 years ago

If you don't specify HPO terms in the command line, AnnotSV will not use the Exomiser module. This module provides a phenotype-driven analysis. The given score and annotations are specific to a phenotype (to a patient).

lgmgeo commented 4 years ago

For a given phenotype, the HPO-based score corresponding to a damaging probability is provided for each gene overlapped with an SV so that:

serge2016 commented 4 years ago

Is it possible to provide all phenotypes at once?

Stikus commented 4 years ago

You didn't give HPO argument in your command line to describe the phenotype of your patient. Try something like:

/soft/AnnotSV-2.3.2/bin/AnnotSV -annotationsDir /ref/AnnotSV/2.3.2 -genomeBuild GRCh37 -SVinputFile /inputs/som_candidateSV.vcf -outputFile /outputs/test_som.annotsv.tsv -hpo "HP:0001156,HP:0001363,HP:0011304,HP:0010055"

After your addition Exomiser start working, but only for one of my test files:

  Command: '/soft/AnnotSV-2.3.2/bin/AnnotSV -annotationsDir /ref/AnnotSV/2.3.2 -genomeBuild GRCh37 -SVinputFile /inputs/germ_candidateSV.vcf -outputFile /outputs/test_germ.annotsv.tsv -hpo HP:0001156,HP:0001363,HP:0011304,HP:0010055'.
  PID=156 (last job)
AnnotSV 2.3.2

Copyright (C) 2017-2019 GEOFFROY Veronique

Please feel free to contact me for any suggestions or bug reports
email: veronique.geoffroy@inserm.fr

Tcl/Tk version: 8.6

Application name used (defined with the "ANNOTSV" environment variable):
/soft/AnnotSV-2.3.2

...downloading the configuration data (June 16 2020 - 10:02)
    ...configuration data by default
    ...configuration data from /soft/AnnotSV-2.3.2/etc/AnnotSV/configfile
    ...configuration data given in arguments
    ...checking configuration data and files

WARNING: No GeneHancer annotations available.
(Please, see in the README file how to add these annotations. Users need to contact the GeneCards team.)

    INFO: AnnotSV takes use of Exomiser (Smedley et al., 2015) for the phenotype-driven analysis.
    INFO: AnnotSV is using the Human Phenotype Ontology (version 1902). Find out more at http://www.human-phenotype-ontology.org

    ******************************************
    AnnotSV has been run with these arguments:
    ******************************************
    -SVinputFile /inputs/germ_candidateSV.vcf
    -SVinputInfo 1
    -SVminSize 50
    -annotationsDir /ref/AnnotSV/2.3.2
    -bedtools bedtools
    -candidateGenesFiltering no
    -genomeBuild GRCh37
    -hpo HP:0001156,HP:0001363,HP:0011304,HP:0010055
    -metrics us
    -minTotalNumber 500
    -organism Human
    -outputDir /outputs
    -outputFile test_germ.annotsv.tsv
    -overlap 70
    -overwrite yes
    -promoterSize 500
    -rankFiltering 1 2 3 4 5
    -rankOutput no
    -reciprocal no
    -snvIndelPASS 0
    -svtBEDcol -1
    ******************************************

    no intersection between SV and gene annotation
...listing of the annotations to realized (June 16 2020 - 10:02)
    ...refGene annotation
    (with /ref/AnnotSV/2.3.2/Annotations_Human/RefGene/GRCh37/refGene.sorted.bed)
    ...Genes-based annotations
        ...20181211_ACMG.tsv
        (59 gene identifiers and 1 annotations columns: ACMG)
        ...20191219_DDG2P.tsv.gz
        (1982 gene identifiers and 5 annotations columns: DDD_status, DDD_mode, DDD_consequence, DDD_disease, DDD_pmids)
        ...20191219_HI.tsv.gz
        (19124 gene identifiers and 1 annotations columns: HI_DDDpercent)
        ...20191219_GeneIntolerance.pLI-Zscore.annotations.tsv.gz
        (18241 gene identifiers and 3 annotations columns: synZ_ExAC, misZ_ExAC, pLI_ExAC)
        ...20191219_ExAC.CNV-Zscore.annotations.tsv.gz
        (15673 gene identifiers and 3 annotations columns: delZ_ExAC, dupZ_ExAC, cnvZ_ExAC)
        ...20191216_OMIM-1-annotations.tsv.gz
        (14411 gene identifiers and 1 annotations columns: Mim Number)
        ...20191216_morbidGenesCandidates.tsv.gz
        (3136 gene identifiers and 1 annotations columns: morbidGenesCandidates)
        ...20191216_OMIM-2-annotations.tsv.gz
        (14411 gene identifiers and 2 annotations columns: Phenotypes, Inheritance)
        ...20191216_morbidGenes.tsv.gz
        (11249 gene identifiers and 1 annotations columns: morbidGenes)
        ...20191219_ClinGenAnnotations.tsv.gz
        (1392 gene identifiers and 2 annotations columns: HI_CGscore, TriS_CGscore)
    ...Annotations with features overlapping the SV
        ...DGV Gold Standard frequency annotation
        ...gnomAD frequency annotation
        ...DDD frequency annotation
        ...1000g frequency annotation
        ...Ira M. Hall's lab frequency annotation
    ...Annotations with features overlapped with the SV
        ...Promoters annotation
        ...dbVar_pathogenic_NR_SV annotation
        ...TAD annotation
    ...Breakpoints annotations
        ...GC content annotation
        ...Repeat annotation

...annotation in progress (June 16 2020 - 10:02)

...Output columns annotation:
    AnnotSV ID; SV chrom; SV start; SV end; SV length; SV type; ID; REF; ALT; QUAL; FILTER; INFO; AnnotSV type; Gene name; NM; CDS length; tx length; location; location2; intersectStart; intersectEnd; DGV_GAIN_IDs; DGV_GAIN_n_samples_with_SV; DGV_GAIN_n_samples_tested; DGV_GAIN_Frequency; DGV_LOSS_IDs; DGV_LOSS_n_samples_with_SV; DGV_LOSS_n_samples_tested; DGV_LOSS_Frequency; GD_ID; GD_AN; GD_N_HET; GD_N_HOMALT; GD_AF; GD_POPMAX_AF; GD_ID_others; DDD_SV; DDD_DUP_n_samples_with_SV; DDD_DUP_Frequency; DDD_DEL_n_samples_with_SV; DDD_DEL_Frequency; 1000g_event; 1000g_AF; 1000g_max_AF; IMH_ID; IMH_AF; IMH_ID_others; promoters; dbVar_event; dbVar_variant; dbVar_status; TADcoordinates; ENCODEexperiments; GCcontent_left; GCcontent_right; Repeats_coord_left; Repeats_type_left; Repeats_coord_right; Repeats_type_right; ACMG; DDD_status; DDD_mode; DDD_consequence; DDD_disease; DDD_pmids; HI_DDDpercent; synZ_ExAC; misZ_ExAC; pLI_ExAC; delZ_ExAC; dupZ_ExAC; cnvZ_ExAC; Mim Number; morbidGenesCandidates; Phenotypes; Inheritance; morbidGenes; HI_CGscore; TriS_CGscore; AnnotSV ranking

...AnnotSV is done with the analysis (June 16 2020 - 10:02)
  Command: '/soft/AnnotSV-2.3.2/bin/AnnotSV -annotationsDir /ref/AnnotSV/2.3.2 -genomeBuild GRCh37 -SVinputFile /inputs/som_candidateSV.vcf -outputFile /outputs/test_som.annotsv.tsv -hpo HP:0001156,HP:0001363,HP:0011304,HP:0010055'.
  PID=156 (last job)
AnnotSV 2.3.2

Copyright (C) 2017-2019 GEOFFROY Veronique

Please feel free to contact me for any suggestions or bug reports
email: veronique.geoffroy@inserm.fr

Tcl/Tk version: 8.6

Application name used (defined with the "ANNOTSV" environment variable):
/soft/AnnotSV-2.3.2

...downloading the configuration data (June 16 2020 - 10:02)
    ...configuration data by default
    ...configuration data from /soft/AnnotSV-2.3.2/etc/AnnotSV/configfile
    ...configuration data given in arguments
    ...checking configuration data and files

WARNING: No GeneHancer annotations available.
(Please, see in the README file how to add these annotations. Users need to contact the GeneCards team.)

    INFO: AnnotSV takes use of Exomiser (Smedley et al., 2015) for the phenotype-driven analysis.
    INFO: AnnotSV is using the Human Phenotype Ontology (version 1902). Find out more at http://www.human-phenotype-ontology.org

    ******************************************
    AnnotSV has been run with these arguments:
    ******************************************
    -SVinputFile /inputs/som_candidateSV.vcf
    -SVinputInfo 1
    -SVminSize 50
    -annotationsDir /ref/AnnotSV/2.3.2
    -bedtools bedtools
    -candidateGenesFiltering no
    -genomeBuild GRCh37
    -hpo HP:0001156,HP:0001363,HP:0011304,HP:0010055
    -metrics us
    -minTotalNumber 500
    -organism Human
    -outputDir /outputs
    -outputFile test_som.annotsv.tsv
    -overlap 70
    -overwrite yes
    -promoterSize 500
    -rankFiltering 1 2 3 4 5
    -rankOutput no
    -reciprocal no
    -snvIndelPASS 0
    -svtBEDcol -1
    ******************************************

...running Exomiser
    ...on port 50000
    ...starting the REST service
    ...idService = 177

...listing of the annotations to realized (June 16 2020 - 10:03)
    ...refGene annotation
    (with /ref/AnnotSV/2.3.2/Annotations_Human/RefGene/GRCh37/refGene.sorted.bed)
    ...Genes-based annotations
        ...20181211_ACMG.tsv
        (59 gene identifiers and 1 annotations columns: ACMG)
        ...20191219_DDG2P.tsv.gz
        (1982 gene identifiers and 5 annotations columns: DDD_status, DDD_mode, DDD_consequence, DDD_disease, DDD_pmids)
        ...20191219_HI.tsv.gz
        (19124 gene identifiers and 1 annotations columns: HI_DDDpercent)
        ...20191219_GeneIntolerance.pLI-Zscore.annotations.tsv.gz
        (18241 gene identifiers and 3 annotations columns: synZ_ExAC, misZ_ExAC, pLI_ExAC)
        ...20191219_ExAC.CNV-Zscore.annotations.tsv.gz
        (15673 gene identifiers and 3 annotations columns: delZ_ExAC, dupZ_ExAC, cnvZ_ExAC)
        ...20191216_OMIM-1-annotations.tsv.gz
        (14411 gene identifiers and 1 annotations columns: Mim Number)
        ...20191216_morbidGenesCandidates.tsv.gz
        (3136 gene identifiers and 1 annotations columns: morbidGenesCandidates)
        ...20191216_OMIM-2-annotations.tsv.gz
        (14411 gene identifiers and 2 annotations columns: Phenotypes, Inheritance)
        ...20191216_morbidGenes.tsv.gz
        (11249 gene identifiers and 1 annotations columns: morbidGenes)
        ...20191219_ClinGenAnnotations.tsv.gz
        (1392 gene identifiers and 2 annotations columns: HI_CGscore, TriS_CGscore)
        ...20200616-100245_exomiser_gene_pheno.tmp.tsv
        (1 gene identifiers and 4 annotations columns: EXOMISER_GENE_PHENO_SCORE, HUMAN_PHENO_EVIDENCE, MOUSE_PHENO_EVIDENCE, FISH_PHENO_EVIDENCE)
    ...Annotations with features overlapping the SV
        ...DGV Gold Standard frequency annotation
        ...gnomAD frequency annotation
        ...DDD frequency annotation
        ...1000g frequency annotation
        ...Ira M. Hall's lab frequency annotation
    ...Annotations with features overlapped with the SV
        ...Promoters annotation
        ...dbVar_pathogenic_NR_SV annotation
        ...TAD annotation
    ...Breakpoints annotations
        ...GC content annotation
        ...Repeat annotation

...annotation in progress (June 16 2020 - 10:03)

...Output columns annotation:
    AnnotSV ID; SV chrom; SV start; SV end; SV length; SV type; ID; REF; ALT; QUAL; FILTER; INFO; AnnotSV type; Gene name; NM; CDS length; tx length; location; location2; intersectStart; intersectEnd; DGV_GAIN_IDs; DGV_GAIN_n_samples_with_SV; DGV_GAIN_n_samples_tested; DGV_GAIN_Frequency; DGV_LOSS_IDs; DGV_LOSS_n_samples_with_SV; DGV_LOSS_n_samples_tested; DGV_LOSS_Frequency; GD_ID; GD_AN; GD_N_HET; GD_N_HOMALT; GD_AF; GD_POPMAX_AF; GD_ID_others; DDD_SV; DDD_DUP_n_samples_with_SV; DDD_DUP_Frequency; DDD_DEL_n_samples_with_SV; DDD_DEL_Frequency; 1000g_event; 1000g_AF; 1000g_max_AF; IMH_ID; IMH_AF; IMH_ID_others; promoters; dbVar_event; dbVar_variant; dbVar_status; TADcoordinates; ENCODEexperiments; GCcontent_left; GCcontent_right; Repeats_coord_left; Repeats_type_left; Repeats_coord_right; Repeats_type_right; ACMG; DDD_status; DDD_mode; DDD_consequence; DDD_disease; DDD_pmids; HI_DDDpercent; synZ_ExAC; misZ_ExAC; pLI_ExAC; delZ_ExAC; dupZ_ExAC; cnvZ_ExAC; Mim Number; morbidGenesCandidates; Phenotypes; Inheritance; morbidGenes; HI_CGscore; TriS_CGscore; EXOMISER_GENE_PHENO_SCORE; HUMAN_PHENO_EVIDENCE; MOUSE_PHENO_EVIDENCE; FISH_PHENO_EVIDENCE; AnnotSV ranking

...AnnotSV is done with the analysis (June 16 2020 - 10:03)

Is no intersection between SV and gene annotation line remnant of Exomiser step?

lgmgeo commented 4 years ago

Is it possible to provide all phenotypes at once?

Sorry, totally impossible :o) And it would make no sense to me, there are too many phenotypes (or combination of different phenotypes) possible. I can't even imagine this number (hundreds of thousands?). Moreover, it would be unreadable...

lgmgeo commented 4 years ago

Is no intersection between SV and gene annotation line remnant of Exomiser step?

Absolutely. If there is no overlapped gene, you could not have exomiser score linked to a gene

Stikus commented 4 years ago

Thank you for your answers, Exomiser finally working.

Manual update will wait until you update your part of data.

Closing.