kircherlab / CADD-SV

CADD-SV – a framework to score the effect of structural variants
https://cadd-sv.bihealth.org
MIT License
14 stars 3 forks source link

Error in rule scoring #14

Closed WZo0o closed 1 year ago

WZo0o commented 1 year ago

Hi,

When I run the workflow, I met the following error:

[Sun Jun 11 15:10:29 2023]
rule scoring:
    input: GRCh38.nr_deletions/matrix.bed, GRCh38.nr_deletions/matrix_100bpdown.bed, GRCh38.nr_deletions/matrix_100bpup.bed
    output: output/GRCh38.nr_deletions.score
    jobid: 2
    wildcards: set=GRCh38.nr_deletions

Submitted job 2 with external jobid '2915471.mu01'.
[Sun Jun 11 16:13:15 2023]
Error in rule scoring:
    jobid: 2
    output: output/GRCh38.nr_deletions.score
    conda-env: /data/wangzh/tools/CADD-SV/.snakemake/conda/9c6953c1
    shell:

        Rscript --vanilla scripts/scoring.R GRCh38.nr_deletions GRCh38.nr_deletions/matrix.bed GRCh38.nr_deletions/matrix_100bpdown.bed GRCh38.nr_deletions/matrix_100bpup.bed output/GRCh38.nr_deletions.score

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
    cluster_jobid: 2915471.mu01

Error executing rule scoring on cluster (jobid: 2, external: 2915471.mu01, jobscript: /data/wangzh/tools/CADD-SV/.snakemake/tmp.ffi1aqfz/snakejob.scoring.2.sh). For error details see the cluster log and the log files of the involved rule(s).

In order to solve the error, I run the shell command and met the following error:

Rscript --vanilla scripts/scoring.R GRCh38.nr_deletions GRCh38.nr_deletions/matrix.bed GRCh38.nr_deletions/matrix_100bpdown.bed GRCh38.nr_deletions/matrix_100bpup.bed output/GRCh38.nr_deletions.score
Error in Ops.data.frame(k[[2]], k[[5]]) :
  ‘+’ only defined for equally-sized data frames
Calls: caddsv -> Ops.data.frame
In addition: There were 50 or more warnings (use warnings() to see the first 50)
Execution halted

Do you have any suggestions for me?

makirc commented 1 year ago

Hi, My best guess might be some additional characters/mis-formatting in your BED file. Would you be able to share it with me, so that I can have a look? (martin.kircher at bih-charite.de). Best, Martin

WZo0o commented 1 year ago

Hi, thanks for your help. I can my BED file with you and I am looking forward to your reply.

id_GRCh38.nr_deletions.zip

Best

makirc commented 1 year ago

Just a brief update. I could not see problems with the file, so I am currently running it through CADD-SV. My current hypothesis would be that a subjob was killed on your systems due to resource limitations (i.e. out of memory). Will keep you posted whether I can reproduce the error.

WZo0o commented 1 year ago

Thanks for your help and I am looking forward to your reply.

makirc commented 1 year ago

Just a brief update. It is still running (without any errors so far). I only gave it a very limited number of CPUs, not realizing that you had millions of SVs in your file. I will keep you posted about results. Generally would recommend to split up such big files in a number of parallel runs. Some annotations take considerably longer than others, limiting the parallelization by annotation which is implemented in our workflow.

WZo0o commented 1 year ago

Thanks for your nice notification and I am looking forward to your latest news.

makirc commented 1 year ago

Hi there, I can reproduce the problem. I think it is a single variant in the whole file. :(

chr4    0       705161  DEL

The coordinate range is correctly specified in BED format, but CADD-SV seems to need a coordinate that is at least 1, i.e.

chr4    1       705161  DEL

seems to work. I'll try to look into the code to fix that on our end, but it should be easy to have this workaround for now.

WZo0o commented 1 year ago

Thanks for your nice help