moiexpositoalonsolab / grenepipe

A flexible, scalable, and reproducible pipeline to automate variant calling from raw sequence reads, with lots of bells and whistles.
http://grene-net.org
GNU General Public License v3.0
93 stars 21 forks source link

problem with dedup #44

Closed ospfsg closed 6 months ago

ospfsg commented 7 months ago

Hi Lucas

I after the problem with picard I tried the dedup instead in a poolseq data. I attached the log file

2024-03-25T144328.892651.snakemake.log

I got this error message:

Error in rule sort_reads_dedup: jobid: 56 output: dedup/PN1.bam, dedup/PN1.done log: logs/samtools/sort/PN1-dedup.log (check log file(s) for error message) conda-env: /home/dau1/software/conda-envs/461c39411718053aed08d5885bf47783

RuleException: CalledProcessError in line 76 of /home/dau1/software/grenepipe-0.12.2/rules/duplicates-dedup.smk: Command 'source /home/dau1/miniconda3/envs/grenepipe/bin/activate '/home/dau1/software/conda-envs/461c39411718053aed08d5885bf47783'; set -euo pipefail; /home/dau1/miniconda3/envs/grenepipe/bin/python3.7 /mnt/data1/Project_QRO_Poolseq/Operational/4_data_analysis/5_grenepipe/run2/.snakemake/scripts/tmpqov0n9y3.wrapper.py' returned non-zero exit status 1. File "/home/dau1/miniconda3/envs/grenepipe/lib/python3.7/site-packages/snakemake/executors/init.py", line 2347, in run_wrapper File "/home/dau1/software/grenepipe-0.12.2/rules/duplicates-dedup.smk", line 76, in rule_sort_reads_dedup File "/home/dau1/miniconda3/envs/grenepipe/lib/python3.7/site-packages/snakemake/executors/init.py", line 568, in _callback File "/home/dau1/miniconda3/envs/grenepipe/lib/python3.7/concurrent/futures/thread.py", line 57, in run File "/home/dau1/miniconda3/envs/grenepipe/lib/python3.7/site-packages/snakemake/executors/init.py", line 554, in cached_or_run File "/home/dau1/miniconda3/envs/grenepipe/lib/python3.7/site-packages/snakemake/executors/init__.py", line 2359, in run_wrapper

lczech commented 7 months ago

Hi @ospfsg,

can you please attach some of the log files of that step? You'll find them in logs/samtools/sort/. Thanks!

Does that also mean that #43 is solved? Shall we close that one then?

Cheers and so long Lucas

ospfsg commented 6 months ago

Hi @lcech

I sorted most of the previous problems #43 and #44. But I am ending up with new ones:

This time the log file is empty and the error messages are odd:

/usr/bin/bash: line 1: AP028914: command not found !!!!

see below the where the problem started, I attached the empty log file:

ENA|AP028914|AP028914.1.log

and the general log file:

2024-04-06T154706.002800.snakemake.log

could you advice me on this?

 output: called/ENA|AP028914|AP028914.1.vcf (pipe), called/ENA|AP028914|AP028914.1.vcf.done
        log: logs/freebayes/ENA|AP028914|AP028914.1.log
        jobid: 271
        benchmark: benchmarks/freebayes/ENA|AP028914|AP028914.1.bench.log
        wildcards: contig=ENA|AP028914|AP028914.1
        threads: 118

    [Sat Apr  6 17:51:31 2024]
    rule compress_vcf:
        input: called/ENA|AP028914|AP028914.1.vcf
        output: called/ENA|AP028914|AP028914.1.vcf.gz, called/ENA|AP028914|AP028914.1.vcf.gz.done
        log: logs/compress_vcf/ENA|AP028914|AP028914.1.log
        jobid: 270
        wildcards: contig=ENA|AP028914|AP028914.1
        threads: 2

Activating conda environment: /home/dau1/software/conda-envs/274fa057cbe0d35f70f6e72a7bbf331c
Activating conda environment: /home/dau1/software/conda-envs/c33c6fe4a4427c0a2e5bff68c2c7ae7c
/usr/bin/bash: line 1: AP028914: command not found
/usr/bin/bash: line 1: AP028914.1.vcf: command not found
/usr/bin/bash: line 1: AP028914: command not found
/usr/bin/bash: line 1: AP028914: command not found
/usr/bin/bash: line 1: AP028914.1.log: command not found
Activating conda environment: /home/dau1/software/conda-envs/c33c6fe4a4427c0a2e5bff68c2c7ae7c
/usr/bin/bash: line 1: AP028914: command not found
/usr/bin/bash: line 1: AP028914.1.log: command not found
/usr/bin/bash: line 1: AP028914: command not found
/usr/bin/bash: line 1: AP028914.1.vcf: command not found
Writing to /tmp/bcftools.9ofika
Academic tradition requires you to cite works you base your article on.
If you use programs that use GNU Parallel to process data for an article in a
scientific publication, please cite:

  O. Tange (2018): GNU Parallel 2018, Mar 2018, ISBN 9781387509881,
  DOI https://doi.org/10.5281/zenodo.1146014

This helps funding further development; AND IT WON'T COST YOU A CENT.
If you pay 10000 EUR you should feel free to use GNU Parallel without citing.

More about funding GNU Parallel and the citation notice:
https://www.gnu.org/software/parallel/parallel_design.html#Citation-notice

To silence this citation notice: run 'parallel --citation' once.

[Sat Apr  6 17:51:32 2024]
Error in group job b179584a-f6f9-4cdf-9f53-d7c4fabb4a39:
    [Sat Apr  6 17:51:32 2024]
    Error in rule compress_vcf:
        jobid: 270
        output: called/ENA|AP028914|AP028914.1.vcf.gz, called/ENA|AP028914|AP028914.1.vcf.gz.done
        log: logs/compress_vcf/ENA|AP028914|AP028914.1.log (check log file(s) for error message)
        conda-env: /home/dau1/software/conda-envs/274fa057cbe0d35f70f6e72a7bbf331c
        shell:
        bgzip --force --threads 2 called/ENA|AP028914|AP028914.1.vcf > called/ENA|AP028914|AP028914.1.vcf.gz 2> logs/compress_vcf/ENA|AP028914|AP028914.1.log
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

    [Sat Apr  6 17:51:32 2024]
    Error in rule call_variants:
        jobid: 271
        output: called/ENA|AP028914|AP028914.1.vcf (pipe), called/ENA|AP028914|AP028914.1.vcf.done
        log: logs/freebayes/ENA|AP028914|AP028914.1.log (check log file(s) for error message)
        conda-env: /home/dau1/software/conda-envs/c33c6fe4a4427c0a2e5bff68c2c7ae7c

[W::bcf_hrec_check] Invalid tag name: "technology.-"
[Sat Apr  6 17:51:51 2024]
Finished job 219.
241 of 288 steps (84%) done
Merging 1 temporary files
[W::bcf_hrec_check] Invalid tag name: "technology.-"
[W::bcf_hrec_check] Invalid tag name: "technology.-"
Cleaning
Done
Traceback (most recent call last):
  File "/mnt/data1/Project_Miridios/Operational/4_data_analysis/5_grenepipe/run2/.snakemake/scripts/tmp1m3j68uq.freebayes.py", line 162, in <module>
    "({freebayes} {extra_params} -f {snakemake.input.ref}"
  File "/home/dau1/miniconda3/envs/grenepipe/lib/python3.7/site-packages/snakemake/shell.py", line 231, in __new__
    raise sp.CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'set -euo pipefail;  (freebayes-parallel <(bedtools intersect -a <(sed 's/:\([0-9]*\)-\([0-9]*\)$/\t\1\t\2/' <(echo "ENA|AP028914|AP028914.1:0-16004536")) -b <(sed 's/:\([0-9]*\)-\([0-9]*\)$/\t\1\t\2/' <(fasta_generate_regions.py /mnt/data1/Project_Miridios/Operational/6_reference_genomes/Genome_nesidiocoris_tenuis/GCA_036186465.1.fasta.fai 100000)) | sed 's/\t\([0-9]*\)\t\([0-9]*\)$/:\1-\2/') 118 --min-alternate-count 2 -f /mnt/data1/Project_Miridios/Operational/6_reference_genomes/Genome_nesidiocoris_tenuis/GCA_036186465.1.fasta dedup/MIR_Dicy37_EKDN230030510-1A_HJGYCDSX7_L1.bam dedup/MIR_Dicy43_EKDN230030511-1A_HJGYCDSX7_L1.bam dedup/MIR_Dicy47_EKDN230030512-1A_HJK33DSX7_L2.bam dedup/MIR_Dicy78b_EKDN230030513-1A_HJGYCDSX7_L1.bam dedup/MIR_Dicy88_EKDN230030514-1A_HKM3VDSX7_L3.bam dedup/MIR_Macr17_EKDN230030515-1A_HJGYCDSX7_L1.bam dedup/MIR_Macr18_EKDN230030516-1A_HJGYCDSX7_L1.bam dedup/MIR_Macr19_EKDN230030517-1A_HJGYCDSX7_L1.bam dedup/MIR_Macr20_EKDN230030518-1A_HJGYCDSX7_L1.bam dedup/MIR_Macr25_EKDN230030519-1A_HJGYCDSX7_L1.bam dedup/MIR_Macr30_EKDN230030520-1A_HKM5HDSX7_L4.bam dedup/MIR_Macr31.bam dedup/MIR_Macr32_EKDN230030522-1A_HJK33DSX7_L1.bam dedup/MIR_Macr33b_EKDN230030523-1A_HJGYCDSX7_L1.bam dedup/MIR_Macr37_EKDN230030524-1A_HJGYCDSX7_L1.bam dedup/MIR_Macr38_EKDN230030525-1A_HJGYCDSX7_L1.bam dedup/MIR_Macr42_EKDN230030526-1A_HJGYCDSX7_L1.bam dedup/MIR_Macr46b_EKDN230030527-1A_HKM3VDSX7_L3.bam dedup/MIR_Macr47_EKDN230030528-1A_HKM3VDSX7_L2.bam dedup/MIR_Macr54b_EKDN230030529-1A_HJGYCDSX7_L1.bam dedup/MIR_Macr56_EKDN230030530-1A_HJGYCDSX7_L1.bam dedup/MIR_Macr58_EKDN230030531-1A_HJGYCDSX7_L1.bam | bcftools sort -Ou - | bcftools view -Ov - > called/ENA|AP028914|AP028914.1.vcf)  > logs/freebayes/ENA|AP028914|AP028914.1.log 2>&1' returned non-zero exit status 127.
[Sat Apr  6 17:52:16 2024]
Finished job 232.
242 of 288 steps (84%) done
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /mnt/data1/Project_Miridios/Operational/4_data_analysis/5_grenepipe/run2/.snakemake/log/2024-04-06T154706.002800.snakemake.log
(grenepipe) dau1@frey:~/software/grenepipe-0.12.2$ 
lczech commented 6 months ago

Oh haha that is an interesting new error that I have not seen before. The issue is that your reference genome contains chromosomes or contigs with names such as ENA|AP028914|AP028914.1. That name contains pipe characters (|), which have a special meaning in Unix/Linux systems. As grenepipe runs the variant calling per chromosome/contig, and uses these names to name the resulting files, this hence introduces pipe characters into the file names, and hence into the commands being run that Unix then interprets in a different way. That then leads to the error.

Generally, I recommend to only use safe characters in file names. See last paragraph of the section here:

we recommend to ensure file names that only consist of alpha-numerical characters, dots, dashes, and underscores. Almost all other characters are special in some contexts, and might hence cause trouble when running the pipeline.

In your case, your samples are all named fine, but then the error came from the chromosome/contig names in the reference genome, which I had not thought of to check before, and hence is not checked prior to running the pipeline.

I will add a check for this to the code, so that a nice error message is printed. However, I won't have time for that in the next couple of weeks. So, for now, the quick solution for you is to re-name the sequences in the reference genome (/mnt/data1/Project_Miridios/Operational/6_reference_genomes/Genome_nesidiocoris_tenuis/GCA_036186465.1.fasta) by removing any characters that are not dashes, underscores, or dots.

Hope that helps, and let me know if this works :-)

Cheers and so long Lucas

lczech commented 6 months ago

Hey @ospfsg,

did this resolve the issue? It seems that according to #45, this issue is solved? If not, feel free to re-open :-)

Cheers and so long Lucas