kcleal / dysgu

Toolkit for calling structural variants using short or long reads
MIT License
92 stars 11 forks source link

Error with working directories #92

Closed poddarharsh15 closed 3 months ago

poddarharsh15 commented 3 months ago

Hi @kcleal Could please suggest me some ideas how to modify the command line to solve this directory errors!

cmd line:-

dysgu run \\
        -p ${task.cpus} \\     
        $fasta \\
        . \\
        $input_bam \\
        | bgzip ${args2} --threads ${task.cpus} --stdout > ${prefix}.vcf.gz
    tabix ${args3} ${prefix}.vcf.gz

2024-06-25 09:07:20,781 [INFO ] [dysgu-run] Version: 1.6.2 Traceback (most recent call last): File "/usr/local/bin/dysgu", line 11, in sys.exit(cli()) File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1157, in call return self.main(args, kwargs) File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) File "/usr/local/lib/python3.10/site-packages/click/core.py", line 783, in invoke return __callback(args, *kwargs) File "/usr/local/lib/python3.10/site-packages/click/decorators.py", line 33, in new_func return f(get_current_context(), args, **kwargs) File "/usr/local/lib/python3.10/site-packages/dysgu/main.py", line 238, in run_pipeline make_wd(kwargs) File "/usr/local/lib/python3.10/site-packages/dysgu/main.py", line 125, in make_wd raise ValueError("Working directory already exists. Add -x / --overwrite=True to proceed, " ValueError: Working directory already exists. Add -x / --overwrite=True to proceed, or supply --ibam to re-use temp files in working directory

Thanks.

kcleal commented 3 months ago

Hi @poddarharsh15, It looks like you are trying to use the current directory as the temp directory .. You will need to add -x to your command to overwrite any temp files

poddarharsh15 commented 3 months ago

Thank you for quick reply can I modify command line like this:-

dysgu run \\
        -p ${task.cpus} \\     
        -x \\
        $fasta \\
        . \\
        $input_bam \\
        | bgzip ${args2} --threads ${task.cpus} --stdout > ${prefix}.vcf.gz
    tabix ${args3} ${prefix}.vcf.gz

Is this okay?

kcleal commented 3 months ago

Yes!

poddarharsh15 commented 3 months ago

I am running the above mentioned command line, still I am not able to emit any outputs Could you please have a look at the log file, Thanks.

Input bam

2024-06-25 11:36:07,112 [INFO ] [dysgu-run] Version: 1.6.2 2024-06-25 11:36:07,112 [INFO ] run -p 2 -x genome.fasta . test.paired_end.recalibrated.sorted.bam 2024-06-25 11:36:07,112 [INFO ] Destination: . 2024-06-25 11:36:07,382 [INFO ] dysgu fetch test.paired_end.recalibrated.sorted.bam written to ./test.paired_end.recalibrated.sorted.dysgu_reads.bam, n=252, time=0:00:00 h:m:s 2024-06-25 11:36:07,382 [INFO ] Input file is: ./test.paired_end.recalibrated.sorted.dysgu_reads.bam [W::hts_idx_load3] The index file is older than the data file: test.paired_end.recalibrated.sorted.bam.bai 2024-06-25 11:36:07,385 [INFO ] Sample name: normal 2024-06-25 11:36:07,385 [INFO ] Writing vcf to stdout 2024-06-25 11:36:07,385 [INFO ] Running pipeline 2024-06-25 11:36:07,552 [INFO ] Calculating insert size. Removed 0 outliers with insert size >= 1359.0 2024-06-25 11:36:07,563 [INFO ] Inferred read length 100.0, insert median 351, insert stdev 145 2024-06-25 11:36:07,564 [INFO ] Max clustering dist 1076 2024-06-25 11:36:07,564 [INFO ] Building graph with clustering 1076 bp 2024-06-25 11:36:07,566 [INFO ] Total input reads 252 2024-06-25 11:36:07,566 [INFO ] Graph constructed 2024-06-25 11:36:07,566 [INFO ] Minimum support 3 2024-06-25 11:36:07,602 [CRITICA] No events found 2024-06-25 11:36:07,602 [INFO ] dysgu run test.paired_end.recalibrated.sorted.bam complete, time=0:00:00 h:m:s

Input cram

2024-06-25 11:36:18,595 [INFO ] [dysgu-run] Version: 1.6.2 2024-06-25 11:36:18,595 [INFO ] run -p 2 -x genome.fasta . test.paired_end.recalibrated.sorted.cram 2024-06-25 11:36:18,595 [INFO ] Destination: . 2024-06-25 11:36:19,056 [INFO ] dysgu fetch test.paired_end.recalibrated.sorted.cram written to ./test.paired_end.recalibrated.sorted.dysgu_reads.bam, n=252, time=0:00:00 h:m:s 2024-06-25 11:36:19,057 [INFO ] Input file is: ./test.paired_end.recalibrated.sorted.dysgu_reads.bam 2024-06-25 11:36:19,058 [INFO ] Sample name: normal 2024-06-25 11:36:19,058 [INFO ] Writing vcf to stdout 2024-06-25 11:36:19,058 [INFO ] Running pipeline 2024-06-25 11:36:19,509 [INFO ] Calculating insert size. Removed 0 outliers with insert size >= 1359.0 2024-06-25 11:36:19,519 [INFO ] Inferred read length 100.0, insert median 351, insert stdev 145 2024-06-25 11:36:19,520 [INFO ] Max clustering dist 1076 2024-06-25 11:36:19,520 [INFO ] Building graph with clustering 1076 bp 2024-06-25 11:36:19,522 [INFO ] Total input reads 252 2024-06-25 11:36:19,522 [INFO ] Graph constructed 2024-06-25 11:36:19,522 [INFO ] Minimum support 3 2024-06-25 11:36:19,556 [CRITICA] No events found 2024-06-25 11:36:19,556 [INFO ] dysgu run test.paired_end.recalibrated.sorted.cram complete, time=0:00:00 h:m:s

kcleal commented 3 months ago

The log suggests only 252 reads were in your test.paired_end.recalibrated.sorted.cram file, is this correct?

poddarharsh15 commented 3 months ago

252

Yes it is correct because these are the test_samples which are extremely small size. <300kb

kcleal commented 3 months ago

Perhaps there are no SVs present? Alternatively you can try adjusting the min-support parameter, it is set at 3 by default for paired-end reads.

poddarharsh15 commented 3 months ago

Probably yes I was running on other test_samples which are a bit larger in size and dysgu was successfully able to detect SVs, --min-support TEXT Minimum number of reads per SV [default: 3] I can change it to maybe 0?

dysgu run \
    -p ${cpus} \
    -x \
    --min-support 0
    ${fasta} \
    . \
    ${input_bam} \
    | bgzip ${args2} --threads ${cpus} --stdout > ${prefix}.vcf.gz

tabix ${args3} ${prefix}.vcf.gz

May I ask for running .cram files I need to add a specific parameters?

kcleal commented 3 months ago

There a no additional parameters to use a cram file. Min-support 0 or 1 will have the same effect, i.e. at least one bit of evidence for a SV call

poddarharsh15 commented 3 months ago

I have tried both 0 and 1 parameters and still there is no change in results, I suppose there's no SV present in the test_data :(