lgmgeo / AnnotSV

Annotation and Ranking of Structural Variation
GNU General Public License v3.0
208 stars 35 forks source link

Conda (and container) version of AnnotSV runs into error when run with -hpo arg #195

Open ManavalanG opened 1 year ago

ManavalanG commented 1 year ago

When I ran AnnotSV using Singularity, it runs into error couldn't open "/usr/local/etc/AnnotSV/application.properties": no such file or directory. I manually checked the container and the file application.properties is indeed missing. I suspect this affects docker container as well as this file is missing in the docker image as well.

Here is what I have debugged so far:

Singularity

singularity exec --bind $ANNOTATIONS_DIR:/annotations/  $CONTAINER AnnotSV -SVinputFile data/test.bed \
>     -outputFile ./data/test.annotated.tsv \
>     -svtBEDcol 4 \
>     -annotationsDir /annotations/
Click for stdout/stderr AnnotSV 3.3.4 Copyright (C) 2017-2023 GEOFFROY Veronique Please feel free to contact me for any suggestions or bug reports email: veronique.geoffroy@inserm.fr Tcl/Tk version: 8.6 Application name used: /usr/local ...downloading the configuration data (September 06 2023 - 14:08) ...configuration data by default ...configuration data from /usr/local/etc/AnnotSV/configfile ...configuration data given in arguments ...checking all these configuration data ...checking the annotation data sources (September 06 2023 - 14:08) WARNING: No GeneHancer annotations available. (Please, see in the README file how to add these annotations. Users need to contact the GeneCards team.) ...listing arguments ****************************************** AnnotSV has been run with these arguments: ****************************************** -REreport 0 -REselect1 1 -REselect2 1 -SVinputFile data/test.bed -SVinputInfo 1 -SVminSize 50 -annotationMode both -annotationsDir /annotations -bcftools bcftools -bedtools bedtools -benignAF 0.01 -candidateGenesFiltering 0 -cytoband 1 -genomeBuild GRCh38 -includeCI 1 -metrics us -miRNAann 1 -minTotalNumber 500 -organism Human -outputDir ./data -outputFile test.annotated.tsv -overlap 100 -overwrite 1 -promoterSize 500 -rankFiltering 1 2 3 4 5 NA -reciprocal 0 -samplesidBEDcol 7 -snvIndelPASS 0 -svtBEDcol 4 -tx RefSeq -variantconvertDir /usr/local/share/python3/variantconvert/ -vcf 0 ****************************************** ...searching for SV overlaps with a gene or a regulatory elements ...461 genes overlapped with an SV ...3773 genes regulated by a regulatory element which is overlapped with an SV ...listing of the annotations to be realized (September 06 2023 - 14:08) ...CytoBand annotation ...Genes annotation ...RefSeq annotation ...Regulatory elements annotations ...Promoter annotations ...EnhancerAtlas annotations ...Annotations with pathogenic genes or genomic regions ...dbVar annotation ...ClinVar annotation ...ClinGen annotation ...Annotations with pathogenic snv/indel ...Annotations with benign genes or genomic regions ...gnomAD annotation ...ClinVar annotation ...ClinGen annotation ...DGV annotation ...DDD annotation ...1000g annotation ...Ira M. Hall's lab annotation ...Children’s Mercy Research Institute ...Annotations with features overlapped with the SV (100 %) ...TAD annotation ...Annotations with features sharing any overlap with the SV ...Breakpoints annotations ...GC content annotation ...Repeat annotation ...Gap annotation ...Segmental duplication annotation ...ENCODE blacklist annotation ...Gene-based annotations ...20220617_ACMG.tsv (78 gene identifiers and 1 annotations columns: ACMG) ...20220906_ClinGenAnnotations.tsv (1480 gene identifiers and 2 annotations columns: HI, TS) ...20200713_HI.tsv.gz (19124 gene identifiers and 1 annotations columns: DDD_HI_percent) ...20191219_ExAC.CNV-Zscore.annotations.tsv.gz (15673 gene identifiers and 3 annotations columns: ExAC_delZ, ExAC_dupZ, ExAC_cnvZ) ...20201023_GeneIntolerance-Zscore.annotations.tsv.gz (18241 gene identifiers and 2 annotations columns: ExAC_synZ, ExAC_misZ) ...20220902_GenCC.tsv (4615 gene identifiers and 4 annotations columns: GenCC_disease, GenCC_moi, GenCC_classification, GenCC_pmid) ...20220905_OMIM-1-annotations.tsv.gz (16250 gene identifiers and 1 annotations columns: OMIM_ID) ...20220905_OMIM-2-annotations.tsv.gz (16250 gene identifiers and 2 annotations columns: OMIM_phenotype, OMIM_inheritance) ...20220905_morbid.tsv.gz (12998 gene identifiers and 1 annotations columns: OMIM_morbid) ...20220905_morbidCandidate.tsv.gz (3467 gene identifiers and 1 annotations columns: OMIM_morbid_candidate) ...20201106_gnomAD.LOEUF.pLI.annotations.tsv.gz (19451 gene identifiers and 3 annotations columns: LOEUF_bin, GnomAD_pLI, ExAC_pLI) ...annotation in progress (September 06 2023 - 14:08) -- GCcontentAnnotation, nuc -- bedtools nuc -fi /annotations/Annotations_Human/BreakpointsAnnotations/GCcontent/GRCh38/GRCh38_chromFa.fasta -bed ./data/test.NA.formatted.sorted.breakpoints.bed > ./data/test.NA.formatted.sorted.GCcontent.txt Feature (14:107151992-107152192) beyond the length of 14 size (107043718 bp). Skipping. Feature (14:107179995-107180195) beyond the length of 14 size (107043718 bp). Skipping. Feature (2:242865820-242866020) beyond the length of 2 size (242193529 bp). Skipping. Feature (2:243028352-243028552) beyond the length of 2 size (242193529 bp). Skipping. ...writing of ./data/test.annotated.tsv (September 06 2023 - 14:09) ...output columns annotation (September 06 2023 - 14:09): AnnotSV_ID;SV_chrom;SV_start;SV_end;SV_length;SV_type;Biologist_annotation;Biologist_ranking;Samples_ID;Annotation_mode;CytoBand;Gene_name;Gene_count;Tx;Tx_start;Tx_end;Overlapped_tx_length;Overlapped_CDS_length;Overlapped_CDS_percent;Frameshift;Exon_count;Location;Location2;Dist_nearest_SS;Nearest_SS_type;Intersect_start;Intersect_end;RE_gene;P_gain_phen;P_gain_hpo;P_gain_source;P_gain_coord;P_loss_phen;P_loss_hpo;P_loss_source;P_loss_coord;P_ins_phen;P_ins_hpo;P_ins_source;P_ins_coord;po_P_gain_phen;po_P_gain_hpo;po_P_gain_source;po_P_gain_coord;po_P_gain_percent;po_P_loss_phen;po_P_loss_hpo;po_P_loss_source;po_P_loss_coord;po_P_loss_percent;P_snvindel_nb;P_snvindel_phen;B_gain_source;B_gain_coord;B_gain_AFmax;B_loss_source;B_loss_coord;B_loss_AFmax;B_ins_source;B_ins_coord;B_ins_AFmax;B_inv_source;B_inv_coord;B_inv_AFmax;po_B_gain_allG_source;po_B_gain_allG_coord;po_B_gain_someG_source;po_B_gain_someG_coord;po_B_loss_allG_source;po_B_loss_allG_coord;po_B_loss_someG_source;po_B_loss_someG_coord;TAD_coordinate;ENCODE_experiment;GC_content_left;GC_content_right;Repeat_coord_left;Repeat_type_left;Repeat_coord_right;Repeat_type_right;Gap_left;Gap_right;SegDup_left;SegDup_right;ENCODE_blacklist_left;ENCODE_blacklist_characteristics_left;ENCODE_blacklist_right;ENCODE_blacklist_characteristics_right;ACMG;HI;TS;DDD_HI_percent;ExAC_delZ;ExAC_dupZ;ExAC_cnvZ;ExAC_synZ;ExAC_misZ;GenCC_disease;GenCC_moi;GenCC_classification;GenCC_pmid;OMIM_ID;OMIM_phenotype;OMIM_inheritance;OMIM_morbid;OMIM_morbid_candidate;LOEUF_bin;GnomAD_pLI;ExAC_pLI;AnnotSV_ranking_score;AnnotSV_ranking_criteria;ACMG_class ...AnnotSV is done with the analysis (September 06 2023 - 14:09)
singularity exec --bind $ANNOTATIONS_DIR:/annotations/  $CONTAINER AnnotSV -SVinputFile data/test.bed     -outputFile ./data/test.annotated.tsv     -svtBEDcol 4     -annotationsDir /annotations/ -hpo "HP:0001156,HP:0001363,HP:0011304"
Click for stdout/stderr AnnotSV 3.3.4 Copyright (C) 2017-2023 GEOFFROY Veronique Please feel free to contact me for any suggestions or bug reports email: veronique.geoffroy@inserm.fr Tcl/Tk version: 8.6 Application name used: /usr/local ...downloading the configuration data (September 06 2023 - 14:09) ...configuration data by default ...configuration data from /usr/local/etc/AnnotSV/configfile ...configuration data given in arguments ...checking all these configuration data ...checking the annotation data sources (September 06 2023 - 14:09) INFO: AnnotSV takes use of Exomiser (Smedley et al., 2015) for the phenotype-driven analysis. INFO: AnnotSV is using the Human Phenotype Ontology (version 2202). Find out more at http://www.human-phenotype-ontology.org WARNING: No GeneHancer annotations available. (Please, see in the README file how to add these annotations. Users need to contact the GeneCards team.) ...listing arguments ****************************************** AnnotSV has been run with these arguments: ****************************************** -REreport 0 -REselect1 1 -REselect2 1 -SVinputFile data/test.bed -SVinputInfo 1 -SVminSize 50 -annotationMode both -annotationsDir /annotations -bcftools bcftools -bedtools bedtools -benignAF 0.01 -candidateGenesFiltering 0 -cytoband 1 -genomeBuild GRCh38 -hpo HP:0001156,HP:0001363,HP:0011304 -includeCI 1 -metrics us -miRNAann 1 -minTotalNumber 500 -organism Human -outputDir ./data -outputFile test.annotated.tsv -overlap 100 -overwrite 1 -promoterSize 500 -rankFiltering 1 2 3 4 5 NA -reciprocal 0 -samplesidBEDcol 7 -snvIndelPASS 0 -svtBEDcol 4 -tx RefSeq -variantconvertDir /usr/local/share/python3/variantconvert/ -vcf 0 ****************************************** ...searching for SV overlaps with a gene or a regulatory elements ...461 genes overlapped with an SV ...3773 genes regulated by a regulatory element which is overlapped with an SV ...running Exomiser on 3780 gene names (September 06 2023 - 14:09) 10000 /usr/local/share/bash/AnnotSV/searchForAFreePortNumber.bash: line 19: ss: command not found WARNING: port is defined to 50000 ...on port 50000 couldn't open "/usr/local/etc/AnnotSV/application.properties": no such file or directory while executing "open $File r" (procedure "ContentFromFile" line 3) invoked from within "ContentFromFile $g_AnnotSV(etcDir)/application.properties" (procedure "runExomiser" line 21) invoked from within "runExomiser "$L_allGenes" "$g_AnnotSV(hpo)" " (procedure "regulatoryElementsAnnotation" line 90) invoked from within "regulatoryElementsAnnotation $L_allGenesOverlapped" (procedure "genesAnnotation" line 394) invoked from within "genesAnnotation" (file "/usr/local/bin/AnnotSV" line 274)
$ singularity shell --bind $ANNOTATIONS_DIR:/annotations/  $CONTAINER AnnotSV -SVinputFile data/test.bed
Singularity> ls /usr/local/etc/AnnotSV/
configfile
Singularity> exit
exit

Docker

I checked the docker container in a Mac machine to see if /usr/local/etc/AnnotSV/application.properties is present in the container. Both v3.3.4 and v3.3.6 were tested. I did not run AnnotSV though as I didn't have a chance to download annotations file in this machine.

$docker run  -it quay.io/biocontainers/annotsv:3.3.4--py311hdfd78af_1 sh
sh-5.0# ls /usr/local/etc/AnnotSV/
configfile
sh-5.0# exit
exit

$ docker run  -it quay.io/biocontainers/annotsv:3.3.6--py311hdfd78af_0
sh-5.0# ls /usr/local/etc/AnnotSV/
configfile
ManavalanG commented 1 year ago

This is likely related to #184

ManavalanG commented 1 year ago

It looks like it has to do with the conda version of AnnotSV and not singularity or docker.

name: annotsv_bioconda

channels:
- conda-forge
- bioconda

dependencies:
- annotsv
$AnnotSV -SVinputFile test.bed -outputFile ./test.annotated.tsv -svtBEDcol 4 -annotationsDir $ANNOTATIONS_DIR
$ AnnotSV -SVinputFile test.bed -outputFile ./test.annotated.tsv -svtBEDcol 4 -annotationsDir $ANNOTATIONS_DIR -hpo "HP:0001156,HP:0001363,HP:0011304"

AnnotSV 3.3.6

Copyright (C) 2017-2023 GEOFFROY Veronique

Please feel free to contact me for any suggestions or bug reports
email: veronique.geoffroy@inserm.fr

Tcl/Tk version: 8.6

Application name used:
/dirpath/.conda/envs/annotsv_bioconda

...downloading the configuration data (September 06 2023 - 14:43)
    ...configuration data by default
    ...configuration data from /dirpath/.conda/envs/annotsv_bioconda/etc/AnnotSV/configfile
    ...configuration data given in arguments
    ...checking all these configuration data

...checking the annotation data sources (September 06 2023 - 14:43)
    INFO: AnnotSV takes use of Exomiser (Smedley et al., 2015) for the phenotype-driven analysis.
    INFO: AnnotSV is using the Human Phenotype Ontology (version 2202). Find out more at http://www.human-phenotype-ontology.org

WARNING: No GeneHancer annotations available.
(Please, see in the README file how to add these annotations. Users need to contact the GeneCards team.)

...listing arguments
    ******************************************
    AnnotSV has been run with these arguments:
    ******************************************
    -REreport 0
    -REselect1 1
    -REselect2 1
    -SVinputFile test.bed
    -SVinputInfo 1
    -SVminSize 50
    -annotationMode both
    -annotationsDir /path/to/AnnotSV/v3.3.6/share/AnnotSV
    -bcftools bcftools
    -bedtools bedtools
    -benignAF 0.01
    -candidateGenesFiltering 0
    -cytoband 1
    -genomeBuild GRCh38
    -hpo HP:0001156,HP:0001363,HP:0011304
    -includeCI 1
    -metrics us
    -miRNAann 1
    -minTotalNumber 500
    -organism Human
    -outputDir .
    -outputFile test.annotated.tsv
    -overlap 100
    -overwrite 1
    -promoterSize 500
    -rankFiltering 1 2 3 4 5 NA
    -reciprocal 0
    -samplesidBEDcol 7
    -snvIndelPASS 0
    -svtBEDcol 4
    -tx RefSeq
    -variantconvertDir /dirpath/.conda/envs/annotsv_bioconda/share/python3/variantconvert/
    -vcf 0
    ******************************************

...searching for SV overlaps with a gene or a regulatory elements
    ...461 genes overlapped with an SV
    ...3773 genes regulated by a regulatory element which is overlapped with an SV

...running Exomiser on 3780 gene names (September 06 2023 - 14:43)
    ...on port 10000
couldn't open "/dirpath/.conda/envs/annotsv_bioconda/etc/AnnotSV/application.properties": no such file or directory
    while executing
"open $File r"
    (procedure "ContentFromFile" line 3)
    invoked from within
"ContentFromFile $g_AnnotSV(etcDir)/application.properties"
    (procedure "runExomiser" line 21)
    invoked from within
"runExomiser "$L_allGenes" "$g_AnnotSV(hpo)" "
    (procedure "regulatoryElementsAnnotation" line 90)
    invoked from within
"regulatoryElementsAnnotation $L_allGenesOverlapped"
    (procedure "genesAnnotation" line 394)
    invoked from within
"genesAnnotation"
    (file "/dirpath/.conda/envs/annotsv_bioconda/bin/AnnotSV" line 274)
$ ls /dirpath/.conda/envs/annotsv_bioconda/etc/AnnotSV/
configfile
$ curl https://raw.githubusercontent.com/lgmgeo/AnnotSV/v3.3.6/etc/AnnotSV/application.properties > /dirpath/.conda/envs/annotsv_bioconda/etc/AnnotSV/application.properties
$ AnnotSV -SVinputFile test.bed -outputFile ./test.annotated.tsv -svtBEDcol 4 -annotationsDir $ANNOTATIONS_DIR -hpo "HP:0001156,HP:0001363,HP:0011304"

*AnnotSV completed successfully created output file test.annotated.tsv

Conclusion

File application.properties is missing in the conda version of AnnotSV for some reason, and this is the cause of the issue seen here when AnnotSV is run with -hpo. Fixing this issue will most likely solve this bug.

PS - Thanks for your work making AnnotSV available via bioconda (#166)!

ManavalanG commented 1 year ago

Nextflow implementation of AnnotSV doesn't appear to use -hpo arg, which explains why this error was not seen there so far.

https://github.com/lgmgeo/AnnotSV/issues/184#issuecomment-1603817934

ManavalanG commented 1 year ago

I'm not really familiar with Makefile but it appears to my naive eyes that copying of application.properties happens during step make install-human-annotation. This step though appears to only run as part of make install-human-annotation, which is never run in the bioconda version.

lgmgeo commented 1 year ago

Hi @ManavalanG,

Thank you very much for all the time spent on debugging! I really appreciate your contribution. I will contact @nvnieuwk to see what is the best debugging to do. I will get back to you asap.

Best, Véronique

nvnieuwk commented 1 year ago

Hi @ManavalanG and @lgmgeo, the reason make install-human-annotation couldn't be run in the recipe is that it would make the recipe very large (which is a bad practice in bioconda). I would advise you to run make install-human-annotation one time first and use the annotations created from that command as your annotations input. I'm not really sure what could be the problem otherwise

lgmgeo commented 1 year ago

I think the problem comes from the $ANNOTSV/bin/INSTALL_annotations.sh file. It induces this installation error.

lgmgeo commented 1 year ago

@nvnieuwk

The best could be to replace the annotation install commands (in $ANNOTSV/bin/INSTALL_annotations.sh) with something like that:

cd /path/to/install/annotsv/annotations
git clone https://github.com/lgmgeo/AnnotSV.git
cd AnnotSV
make PREFIX=. install
make PREFIX=. install-human-annotation
mv share/AnnotSV/Annotations_Exomiser ..
mv share/AnnotSV/Annotations_Human ..
cd  /path/to/install/annotsv/annotations
rm -r AnnotSV

How do you feel about this?

nvnieuwk commented 1 year ago

Looks good to me! I'm all for it as long as this will still work to create a separate folder of the annotations

lgmgeo commented 1 year ago

@ManavalanG, can you give us your thoughts on this?

Modification on the patch_AnnotSV branch: https://github.com/lgmgeo/AnnotSV/blob/patch_AnnotSV/bin/INSTALL_annotations.sh

ManavalanG commented 1 year ago

@lgmgeo Thanks for the quick response and working on this right away :)

I am not too familiar with AnnotSV source code and so please take my thoughts/observations with a huge grain of salt.

If you could point me to bin/INSTALL_annotations.sh role during installation, I will attempt to provide a proper feedback. Again, my apologies for any poor understanding!

lgmgeo commented 1 year ago

Today, I updated the INSTALL_annotations.sh file only on the patch_AnnotSV development branch (not the master operating branch). Here is the result: image

Actually, this file is never used in AnnotSV code (i.e. in $ANNOTSV/share/tcl/AnnotSV/*). The purpose of this file is to help users create a specific directory with AnnotSV annotation, without anything else (no code, no documentation...). This means that it is provided for documentation only, for bioconda/singularity/docker users.

ManavalanG commented 1 year ago

This means that it is provided for documentation only, for bioconda/singularity/docker users.

Oops, I totally missed that part. My apologies! I must have missed this part in the documentation. In my setup, I installed annotsv directly in the HPC, including human annotations, and then passed those annotations to -annotationsDir. Let me take a look at your edits again and get back to you :)

ManavalanG commented 1 year ago

I installed annotations using script INSTALL_annotations.sh (from patch_AnnotSV branch) and then ran conda-installed AnnotSV with -annotationsDir pointing to them. Unfortunately, it behaved the same way as I reported yesterday. It was successful without -hpo but ran into the same error (couldn't open "/dirpath/.conda/envs/annotsv_bioconda/etc/AnnotSV/application.properties": no such file or directory) when ran with -hpo.

lgmgeo commented 1 year ago

Ok, I see. I will add a patch asap

lgmgeo commented 11 months ago

1 - I just pushed a patch to fix this issue (only on the patch_AnnotSV development branch).

When using this patch, you also need to copy the following file: $ANNOTSV/etc/AnnotSV/application.properties file to: -annotationsDir/share/AnnotSV/Annotations_Exomiser/2202/application.properties (until I update and publish the AnnotSV annotations)

2 - @nvnieuwk

File application.properties is missing in the conda version of AnnotSV for some reason, and this is the cause of the issue seen here when AnnotSV is run with -hpo. Fixing this issue will most likely solve this bug.

Would it be possible to add this file in the conda version of AnnotSV?

nvnieuwk commented 11 months ago

It's weird that this file isn't in the recipe because it uses an exact copy of the repository. I can have a look when you release the new version :)

ManavalanG commented 6 months ago

Hi! Just checking in to see if there are any updates or fixes. Thanks :)

lgmgeo commented 6 months ago

Bioconda, docker and singularity are distributed from @nvnieuwk (Thanks!).

I can have a look when you release the new version :)

New version is for very soon.

nvnieuwk commented 6 months ago

Hi I'm actually not distributing the containers. They are part of the Biocontainers community. I only maintain the bioconda recipe from which the container is built. I don't have full control over the container

lgmgeo commented 6 months ago

Thanks for the clarification.

ManavalanG commented 6 months ago

I plan to add a checkpoint to check for file application.properties and adding it if not present, prior to running annotsv (in either conda or singularity env). I will post here on how it goes :)

lgmgeo commented 6 months ago

AnnotSV 3.4 is posted.

ManavalanG commented 6 months ago

@lgmgeo I was able to get v3.4 working in singularity. I copied application.properties (as shown in the doc) after installation of human annotation data, and this resolved the issue. Thanks for providing a fix and your awesome support during debugging :)

PS - It would be great to see conda/singularity based installation/usage mentioned in the documentation.

lgmgeo commented 6 months ago

PS - It would be great to see conda/singularity based installation/usage mentioned in the documentation.

Currently, it is mentioned on the web site: https://lbgi.fr/AnnotSV/downloads

I will add it in the README later. Before that, I would like to be sure that the conda version of AnnotSV works for all users but I can't find the time to test. I rely on issues with the flag "Docker/Singularity/Bioconda" (https://github.com/lgmgeo/AnnotSV/issues/184, https://github.com/lgmgeo/AnnotSV/issues/195).

lgmgeo commented 5 months ago

While waiting for the README to be updated, I have integrated the installation/usage documentation into the download page.

@nvnieuwk, can you check and tell me if everything looks OK?

nvnieuwk commented 5 months ago

Looks good!! Thank you!