Closed wenyuhaokikika closed 9 months ago
Hi @wenyuhaokikika, That is a strange error, it suggests that the multiprocessing job can't be loaded properly causing the pickle error. Have you tried re-running the sample? I have not seen this error before, and not sure how this could happen. Alternatively, running in single thread mode should resolve it
Thank you, this problem has been solved~~~
When I ran it again I got a different exception in addition to the problem above. For example
2023-12-19 23:50:29,266 [INFO ] [dysgu-run] Version: 1.6.2
2023-12-19 23:50:29,266 [INFO ] run -x -p 6 /public/home/wenyuhao/seq/WGS/D1/resources/genome.fasta /public/home/wenyuhao/seq/WGS/dysgu/tmpDir /public/home/wenyuhao/seq/WGS/D1/results/recal/DRR016850-1.bam
2023-12-19 23:50:29,266 [INFO ] Destination: /public/home/wenyuhao/seq/WGS/dysgu/tmpDir
2023-12-19 23:53:10,416 [INFO ] dysgu fetch /public/home/wenyuhao/seq/WGS/D1/results/recal/DRR016850-1.bam written to /public/home/wenyuhao/seq/WGS/dysgu/tmpDir/DRR016850-1.dysgu_reads.bam, n=2989478, time=0:02:41 h:m:s
2023-12-19 23:53:10,416 [INFO ] Input file is: /public/home/wenyuhao/seq/WGS/dysgu/tmpDir/DRR016850-1.dysgu_reads.bam
2023-12-19 23:53:10,450 [INFO ] Sample name: DRR016850
2023-12-19 23:53:10,450 [INFO ] Writing vcf to stdout
2023-12-19 23:53:10,450 [INFO ] Running pipeline
2023-12-19 23:53:10,832 [INFO ] Calculating insert size. Removed 86 outliers with insert size >= 784
2023-12-19 23:53:10,843 [INFO ] Inferred read length 101.0, insert median 280, insert stdev 92
2023-12-19 23:53:10,844 [INFO ] Max clustering dist 740
2023-12-19 23:53:10,844 [INFO ] Building graph with clustering 740 bp
2023-12-19 23:53:37,925 [INFO ] Total input reads 2989478
2023-12-19 23:53:39,799 [INFO ] Graph constructed
2023-12-19 23:53:39,801 [INFO ] Minimum support 3
Traceback (most recent call last):
File "/public/home/wenyuhao/anaconda3/envs/dysgu/bin/dysgu", line 8, in <module>
sys.exit(cli())
File "/public/home/wenyuhao/anaconda3/envs/dysgu/lib/python3.8/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/public/home/wenyuhao/anaconda3/envs/dysgu/lib/python3.8/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/public/home/wenyuhao/anaconda3/envs/dysgu/lib/python3.8/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/public/home/wenyuhao/anaconda3/envs/dysgu/lib/python3.8/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/public/home/wenyuhao/anaconda3/envs/dysgu/lib/python3.8/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/public/home/wenyuhao/anaconda3/envs/dysgu/lib/python3.8/site-packages/click/decorators.py", line 33, in new_func
return f(get_current_context(), *args, **kwargs)
File "/public/home/wenyuhao/anaconda3/envs/dysgu/lib/python3.8/site-packages/dysgu/main.py", line 259, in run_pipeline
cluster.cluster_reads(ctx.obj)
File "dysgu/cluster.pyx", line 1188, in dysgu.cluster.cluster_reads
File "dysgu/cluster.pyx", line 996, in dysgu.cluster.pipe1
_pickle.UnpicklingError: unpickling stack underflow
Failed to read from standard input: unknown file type
or
2023-12-19 23:50:29,266 [INFO ] [dysgu-run] Version: 1.6.2
2023-12-19 23:50:29,266 [INFO ] run -x -p 6 /public/home/wenyuhao/seq/WGS/D1/resources/genome.fasta /public/home/wenyuhao/seq/WGS/dysgu/tmpDir /public/home/wenyuhao/seq/WGS/D1/results/recal/DRR016851-1.bam
2023-12-19 23:50:29,266 [INFO ] Destination: /public/home/wenyuhao/seq/WGS/dysgu/tmpDir
2023-12-19 23:53:12,329 [INFO ] dysgu fetch /public/home/wenyuhao/seq/WGS/D1/results/recal/DRR016851-1.bam written to /public/home/wenyuhao/seq/WGS/dysgu/tmpDir/DRR016851-1.dysgu_reads.bam, n=2995745, time=0:02:43 h:m:s
2023-12-19 23:53:12,329 [INFO ] Input file is: /public/home/wenyuhao/seq/WGS/dysgu/tmpDir/DRR016851-1.dysgu_reads.bam
2023-12-19 23:53:12,368 [INFO ] Sample name: DRR016851
2023-12-19 23:53:12,368 [INFO ] Writing vcf to stdout
2023-12-19 23:53:12,368 [INFO ] Running pipeline
2023-12-19 23:53:12,754 [INFO ] Calculating insert size. Removed 86 outliers with insert size >= 777.0
2023-12-19 23:53:12,765 [INFO ] Inferred read length 101.0, insert median 281, insert stdev 93
2023-12-19 23:53:12,766 [INFO ] Max clustering dist 746
2023-12-19 23:53:12,766 [INFO ] Building graph with clustering 746 bp
2023-12-19 23:53:39,518 [INFO ] Total input reads 2995745
2023-12-19 23:53:41,461 [INFO ] Graph constructed
2023-12-19 23:53:41,462 [INFO ] Minimum support 3
Traceback (most recent call last):
File "/public/home/wenyuhao/anaconda3/envs/dysgu/bin/dysgu", line 8, in <module>
sys.exit(cli())
File "/public/home/wenyuhao/anaconda3/envs/dysgu/lib/python3.8/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/public/home/wenyuhao/anaconda3/envs/dysgu/lib/python3.8/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/public/home/wenyuhao/anaconda3/envs/dysgu/lib/python3.8/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/public/home/wenyuhao/anaconda3/envs/dysgu/lib/python3.8/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/public/home/wenyuhao/anaconda3/envs/dysgu/lib/python3.8/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/public/home/wenyuhao/anaconda3/envs/dysgu/lib/python3.8/site-packages/click/decorators.py", line 33, in new_func
return f(get_current_context(), *args, **kwargs)
File "/public/home/wenyuhao/anaconda3/envs/dysgu/lib/python3.8/site-packages/dysgu/main.py", line 259, in run_pipeline
cluster.cluster_reads(ctx.obj)
File "dysgu/cluster.pyx", line 1188, in dysgu.cluster.cluster_reads
File "dysgu/cluster.pyx", line 996, in dysgu.cluster.pipe1
_pickle.UnpicklingError: invalid load key, '\x00'.
Failed to read from standard input: unknown file type
All are pickle errors.
Finally I solved the problem by setting lower cpu cores for dysgu and more mem
and more --cpus-per-task
for slurm.
slurm file
#!/bin/bash
#SBATCH -J dysgu
#SBATCH --nodes=1
#SBATCH -n 1
#SBATCH --cpus-per-task=20
#SBATCH -p batch
#SBATCH -w comput5
#SBATCH --mem=200G
#SBATCH --export=ALL
#SBATCH -o log/output.log
#SBATCH -e log/error.log
#SBATCH --mail-type=FAIL # BEGIN,END,FAIL,ALL
#SBATCH --mail-user=925201392@qq.com
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/public/home/wenyuhao/anaconda3/lib/
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/public/home/wenyuhao/anaconda3/pkgs/openssl-3.0.10-h7f8727e_2/lib/
parallel -j 3 < run_dysgu.sh
and set '--procs' as 10 for dysgu.
dysgu run -x -p 6 /public/home/wenyuhao/seq/WGS/D1/resources/genome.fasta /public/home/wenyuhao/seq/WGS/dysgu/tmpDir /public/home/wenyuhao/seq/WGS/D1/results/recal/DRR016851-1.bam | bcftools view -Oz -o /public/home/wenyuhao/seq/WGS/dysgu/DRR016851.dysgu.vcf.gz && tabix -p vcf /public/home/wenyuhao/seq/WGS/dysgu/DRR016851.dysgu.vcf.gz > /public/home/wenyuhao/seq/WGS/dysgu/logs/DRR016851.log
Thank you so much ~~~
Thanks for Dysgu ~~~
Many samples are running normally, but one of them has a problem. What is the reason? when I run
run -x -p 10 /public/home/wenyuhao/seq/WGS/D1/resources/genome.fasta /public/home/wenyuhao/seq/WGS/dysgu/tmpDir /public/home/wenyuhao/seq/WGS/D1/results/recal/DRR016851-1.bam
, raise ExceptionWhen this problem occurred, I thought there was a problem with my bam file, but when using
samtolls views
, it can be displayed normally.