lgmgeo / AnnotSV

Annotation and Ranking of Structural Variation
GNU General Public License v3.0
208 stars 35 forks source link

AnnotSV Singularity on an HPC Error `command not found` #184

Open yr542 opened 1 year ago

yr542 commented 1 year ago

I am trying to use AnnotSV on my Illumina Manta vcf outputs specifically the diploid vcfs. However, I cannot seem to make it work with a HPO in Singularity.

There are 2 input variables:

The code I used:

# Loop through the files in the input directory
for file in "$input_dir"/*.vcf; do
    # Get the base name of the file without extension
    filename=$(basename "$file" .vcf)

    # Create the output subdirectory within the input directory
    output_dir_parent="$(dirname "$input_dir")"
    decompressed_dir="$output_dir_parent/Step_3___AnnotSV_Main"
    mkdir -p "$decompressed_dir"
    echo "Output directory: $decompressed_dir"
    output_dir="$decompressed_dir/AnnotSV_Batch_Copied"
    mkdir -p "$output_dir"

    # Set the output file path with "_AnnotSV" appended
    output_file="$output_dir/${filename}_AnnotSV_"

    # Run AnnotSV via Singularity with the base name as output prefix
    singularity exec "$singularity_container" /usr/local/etc/AnnotSV/configfile -SVinputFile "$file" -genomeBuild GRCh38 -hpo "HP:0000009" -outputFile "$output_file" 
done

Yet I get the error:

/usr/local/etc/AnnotSV/configfile: line 108: P_loss_phen: command not found
/usr/local/etc/AnnotSV/configfile: line 109: P_loss_hpo: command not found
/usr/local/etc/AnnotSV/configfile: line 110: P_loss_source: command not found
/usr/local/etc/AnnotSV/configfile: line 111: P_loss_coord: command not found
/usr/local/etc/AnnotSV/configfile: line 112: P_ins_phen: command not found
/usr/local/etc/AnnotSV/configfile: line 113: P_ins_hpo: command not found
/usr/local/etc/AnnotSV/configfile: line 114: P_ins_source: command not found
/usr/local/etc/AnnotSV/configfile: line 115: P_ins_coord: command not found
/usr/local/etc/AnnotSV/configfile: line 116: P_inv_phen: command not found
/usr/local/etc/AnnotSV/configfile: line 117: P_inv_hpo: command not found
/usr/local/etc/AnnotSV/configfile: line 118: P_inv_source: command not found
/usr/local/etc/AnnotSV/configfile: line 119: P_inv_coord: command not found
/usr/local/etc/AnnotSV/configfile: line 120: po_P_gain_phen: command not found

but when I try to not define a path in the Singularity it tells me that it cannot find a refGene.txt. What do I do for it to get to work for my HPO. My HPO will change based on what is required.

tsnowlan commented 1 year ago

Singularity is affected by your local environment, so it can be a bit finicky. Try using singularity exec --cleanenv ..., it can solve a lot of mysterious / hard to troubleshoot issues.

Beyond that, it seems like it might be trying to run the tcl file with bash. Can you post the Singularity definition file? It's usually at /.singularity.d/Singularity inside the image.

lgmgeo commented 1 year ago

Thanks Tor for helping @yr542

yr542 commented 1 year ago

How would I do that I can only tell you that I downloaded it?

yr542 commented 1 year ago

singularity exec --cleanenv "$singularity_container" /usr/local/etc/AnnotSV/configfile -SVinputFile "$file" -genomeBuild GRCh38 -hpo "HP:0001249" -outputFile "$output_file" -annotationsDir "$phenotype_to_genes" did not change the error that persists

/usr/local/etc/AnnotSV/configfile: line 11: -benignAF:: command not found
/usr/local/etc/AnnotSV/configfile: line 13: -candidateGenesFiltering:: command not found
/usr/local/etc/AnnotSV/configfile: line 15: -genomeBuild:: command not found
/usr/local/etc/AnnotSV/configfile: line 17: -hpo:: command not found
/usr/local/etc/AnnotSV/configfile: line 19: -includeCI:: command not found

and the error continues.

tsnowlan commented 1 year ago

Ah, I see now. You're calling the configfile, which the image tries to interpret using /bin/sh, rather than AnnotSV. Try with:

singularity exec --cleanenv "$singularity_container" AnnotSV \
    -SVinputFile "$file" \
    -genomeBuild GRCh38 \
    -hpo "HP:0001249"
yr542 commented 1 year ago

Attempt 1: Running the command you gave above

When I did that I got this error ...checking the annotation data sources (June 14 2023 - 09:54) ############################################################################ "/usr/local/share/AnnotSV/Annotations_Human/Genes/GRCh38/refGene.txt.gz" file doesn't exist Please check your install - Exit with error. ############################################################################

Attempt 2: Adding the annotationsDir + path I do think it needs to have an HPO reference file such as phenotype_to_genes.txt but it doesn't seem to accept it when I try and use your above command + annotationsDir because it says refGene.txt.gz file doesn't exist

Attempt 3: Adding the annotationsDir + path to zipped phentype_to_genes.txt (using bgzip) renamed to refGene.txt

But I get the error:

refGene.txt.gz" file doesn't exist
Please check your install - Exit with error.

when the path definitely exists.

lgmgeo commented 1 year ago

Can you try to bind your annotationsDir in Singularity?

singularity exec --bind /usr/local/share/AnnotSV/Annotations_Human \
    --cleanenv "$singularity_container" AnnotSV \
    -SVinputFile "$file" \
    -genomeBuild GRCh38 \
    -annotationsDir /usr/local/share/AnnotSV/Annotations_Human \
    -hpo "HP:0001249"
yr542 commented 1 year ago

When I do this I get the following error repeatedly

FATAL:   container creation failed: mount /usr/local/share/AnnotSV/Annotations_Human->/usr/local/share/AnnotSV/Annotations_Human error: while mounting /usr/local/share/AnnotSV/Annotations_Human: mount source /usr/local/share/AnnotSV/Annotations_Human doesn't exist

For the script there are 2 variables that have been defined:

the script was modified according to your specifications and the modified version is placed below:

Attempt 1: Following you suggestion

for file in "$input_dir"/*.vcf; do
    # Get the base name of the file without extension
    filename=$(basename "$file" .vcf)

    # Create the output subdirectory within the input directory
    output_dir_parent="$(dirname "$input_dir")"
    decompressed_dir="$output_dir_parent/Step_3___AnnotSV_Main"
    mkdir -p "$decompressed_dir"
    echo "Output directory: $decompressed_dir"
    output_dir="$decompressed_dir/AnnotSV_Batch_Copied"
    mkdir -p "$output_dir"

    # Set the output file path with "_AnnotSV" appended
    output_file="$output_dir/${filename}_AnnotSV_"

    # Run AnnotSV via Singularity with the base name as output prefix
    singularity exec --bind /usr/local/share/AnnotSV/Annotations_Human \
    --cleanenv "$singularity_container" AnnotSV \ 
    -SVinputFile "$file" \
    -genomeBuild GRCh38 \
    -annotationsDir /usr/local/share/AnnotSV/Annotations_Human \
    -hpo "HP:0001249" \
    -outputFile "$output_file"
done

Attempt 2: Modifying the path fed to annotationsDir

Below is only the singularity command modified in the same for loop

# Run AnnotSV via Singularity with the base name as output prefix
    singularity exec --bind /usr/local/share/AnnotSV/Annotations_Human \
    --cleanenv "$singularity_container" AnnotSV -SVinputFile "$file" \
    -genomeBuild GRCh38 \
    -annotationsDir "$singularity_container" /usr/local/share/AnnotSV/Annotations_Human \
    -hpo "HP:0001249" \
    -outputFile "$output_file"

However, it results in the same errors in the .out file.

Attempt 3: Adding back the phenotype_to_genes path where phenotype_to_genes.txt has been zipped and renamed refGene.txt with the --bind

The for loop stayed the same except this section was modified shown below:

    # Run AnnotSV via Singularity with the base name as output prefix
    singularity exec --bind /usr/local/share/AnnotSV/Annotations_Human \
    --cleanenv "$singularity_container" AnnotSV -SVinputFile "$file" \
    -genomeBuild GRCh38 \
    -annotationsDir "$phenotype_to_genes" \
    -hpo "HP:0001249" \
    -outputFile "$output_file"

However, it resulted in yet again the same error.

Attempt 4: Trying cleaning with the phenotype_to_genes.txt has been zipped and renamed refGene.txt again

I only modified the following inside the for loop

    # Run AnnotSV via Singularity with the base name as output prefix
    singularity exec --cleanenv "$singularity_container" AnnotSV -SVinputFile "$file" \
    -genomeBuild GRCh38 \
    -annotationsDir "$phenotype_to_genes" \
    -hpo "HP:0001249" \
    -outputFile "$output_file"

And I got the following error when the file definitely exists:

############################################################################
"path/to/refGene.txt.gz" file doesn't exist
Please check your install - Exit with error.
############################################################################

It took the path to the directory I gave it and then tried to go to Annotations_Human/Genes/GRCh38/refGene.txt.gz the subdirectory Annotations_Human which I did not specify?

I do not believe I can use the singularity container path in the annotationsDir.

kiranpatil222 commented 1 year ago

If your SIF file works fine,, Please let me know the working command and Please provide Singularity DEF file for it

yr542 commented 1 year ago

I got the singularity file for AnnotSV when I did this singularity pull docker://quay.io/biocontainers/annotsv I do not believe I can attach my singularity file here?

kiranpatil222 commented 1 year ago

Oohh I See.. Biggest issue with Docker / Singularity when we pull,, there isnt any instructions on how to use its executables,, i also have had similar issues when i pull docker imager through singularity but then No idea how to use it

yr542 commented 1 year ago

Usually it goes with singularity exec but there is something that doesn't work well with the refGene this is why I have this post

nvnieuwk commented 1 year ago

Hi can you try binding the directory with your input files and annotations directory to the container? This can be done with --bind (see here for more information about this).

yr542 commented 1 year ago

I do not believe this AnnotSV singularity works for HPO does anyone have a working example using this Singularity?

nvnieuwk commented 1 year ago

I'm using it in a nextflow pipeline on an HPC cluster and it works perfectly for me...

yr542 commented 1 year ago

Could I have the code you used? As it does not seem to work for me?

nvnieuwk commented 1 year ago

Nextflow arranges the execution of singularity containers internally, so I'm not sure it'll help you, but here is the code that does the execution of AnnotSV: https://github.com/CenterForMedicalGeneticsGhent/nf-cmgg-structural/blob/add-expansionhunter/modules/nf-core/annotsv/annotsv/main.nf

kiranpatil222 commented 1 year ago

Did anyone able to run above annotsv with singularity

yr542 commented 1 year ago

I couldn't get it to work. If anyone has a specific example of how AnnotSV works could they please post it? The Nextflow link to code is a bit complicated and I was just looking for code specific to AnnotSV.

lgmgeo commented 1 year ago

Sorry, I never evaluated AnnotSV via Singularity ;o(

nvnieuwk commented 1 year ago

I'm not sure either why it's not working for everyone else. I didn't create the container myself (it's automatically generated via bioconda). Maybe you could ask more info on the bioconda gitter?

yr542 commented 1 year ago

I think this means that the AnnotSV does not work with Singularity?

nvnieuwk commented 1 year ago

Have you tried converting the docker container to a singularity image? maybe this could work?

yr542 commented 1 year ago

I can only put a Singularity on the cluster. Is there a way to use the Github repository and then a phenotype to genes file to make HPO terms?