zavolanlab / PAQR_KAPAC

scripts, pipelines and documentation to run PAQR and KAPAC; KAPAC allows to infer regulatory sequence motifs implicated in 3’ end processing changes; PAQR enables the quantification of poly(A) site usage from standard RNA-seq data
GNU General Public License v2.0
8 stars 4 forks source link

Error in rule infer_relative_usage: TypeError: not all arguments converted during string formatting #11

Open yangjywhu opened 4 years ago

yangjywhu commented 4 years ago

Hello,

There is no problem in part_one, and all samples are in no_bias_samples.out. But there is a problem in infer_relative_usage of part_two: TypeError: not all arguments converted during string formatting log of snakemake:

RuleException:
CalledProcessError in line 142 of /data1/zhoulab/yangjiayi/project/hl/result/PAQR_gxl/PAQR/part_two.Snakefile:
Command 'set -euo pipefail;  ~/miniconda3/envs/py2_paqr/bin/python scripts/deNovo-used-sites-and-usage-inference.single_distal_included.py         --verbose         --clusters data/annotation/clusters.hg38.canonical_chr.tandem.noOverlap_strand_specific.bed         --coverages DE/coverages/DE-CE.pkl DE/coverages/DE-NM.pkl DE/coverages/DE-CE2.pkl DE/coverages/DE-NM2.pkl         --conditions CNTRL NM CNTRL NM   --ex_extension_files DE/coverages/DE-CE.extensions.tsv DE/coverages/DE-NM.extensions.tsv DE/coverages/DE-CE2.extensions.tsv DE/coverages/DE-NM2.extensions.tsv         --names DE-CE DE-NM DE-CE2 DE-NM2         --read_length 150         --min_coverage_region 100         --min_mean_coverage 5         --ds_reg_for_no_coverage 200         --min_cluster_distance 200         --mse_ratio_threshold 0.5         --best_break_point_upstream_extension 200         --processors 8         --max_downstream_coverage 10         --expressions_out DE/tandem_pas_expressions.tsv         --distal_sites DE/single_distal_sites.tsv         > DE/relative_usages.tsv         2> DE/logs/infer_relative_usage.log' returned non-zero exit status 1.
  File "/data1/zhoulab/yangjiayi/project/hl/result/PAQR_gxl/PAQR/part_two.Snakefile", line 142, in __rule_infer_relative_usage
  File "/data1/zhoulab/yangjiayi/softwares/miniconda3/envs/paqr_kapac/lib/python3.7/concurrent/futures/thread.py", line 57, in run
Removing output files of failed job infer_relative_usage since they might be corrupted:
DE/relative_usages.tsv, DE/relative_usages.header.out
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message

log of infer_relative_usage.log:

############## Started script on 19-11-2019 at 23:59:12 ##############
[INFO] Wed Nov 20 00:03:13 2019 Finished reading input
[INFO] No site was inferred for ENST00000522918:2:2:55517240:55517786. Skipped exon!
......
[INFO] No site was inferred for ENST00000567540:1:1:10039092:10040663. Skipped exon!
Traceback (most recent call last):
  File "scripts/deNovo-used-sites-and-usage-inference.single_distal_included.py", line 2302, in <module>
    main(options)
  File "scripts/deNovo-used-sites-and-usage-inference.single_distal_included.py", line 2118, in main
    result_tuples = pool.map( process_exon_wrapper_function, data_entries)
  File "/home/yangjiayi/miniconda3/envs/py2_paqr/lib/python2.7/multiprocessing/pool.py", line 253, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/home/yangjiayi/miniconda3/envs/py2_paqr/lib/python2.7/multiprocessing/pool.py", line 572, in get
    raise self._value
TypeError: not all arguments converted during string formatting
[INFO] No site was inferred for ENST00000233468:4:4:24067584:24067851. Skipped exon!
[INFO] No site was inferred for ENST00000604724:3:3:13406380:13408433. Skipped exon!

Thank you!

Best Wishes, Jiayi Yang.

koljaLanger commented 4 years ago

Hi Jiayi Yang

it is hard to trace back where the error comes from without having the chance to run the script exactly as it was executed when the error occurred.

However, from the log files it seems like the error occurs with processing a specific exon. It would be easiest to run the script in sequential mode without multiprocessing. In this case, the error message would be way more informative because it would indicate the line at which the error occurs as well as the exon that causes the problem.

To run the script again without multiprocessing, lines 2118 to 2216 need to be set as comment, whereas the lines 2218 to 2286 must be un-commented.

You can try this and share the infer_relative_usage.log here again.

Best, Ralf