Closed davemcg closed 2 years ago
Well I also tried going back to unify 1.0.9 and got a different error about the exon sum files being wrong
+ [[ 48 -ne 54 ]]
+ echo 'FAILURE running unify, unexpected # of exon sum files: 54 vs. 48 (expected)'
FAILURE running unify, unexpected # of exon sum files: 54 vs. 48 (expected)
Which led me to going into the exon_sums_per_study
folder and indeed there was another two digit directory that wasn't in gene_sums_per_study
. The study name had an underscore and it seemed that the parser was creating the directory from the two letters right before the first _
. So I reran those files in pump with a study name (I'm using the $9
explicit study name in pump) without an underscore.
Now I'm getting this error which is the same in 1.0.9
and 1.1.0
:
[Mon May 16 13:29:07 2022]
Finished job 1.
111 of 129 steps (86%) done
[Mon May 16 13:29:07 2022]
rule rejoin_genes:
input: all.exon_bw_count.pasted.gz
output: all.gene_counts.rejoined.tsv.gz, all.intron_counts.rejoined.tsv.gz
jobid: 4
threads: 6
/recount-unify/rejoin/rejoin -a /container-mounts/ref/disjoint2exons2genes.bed -d <(pigz --stdout -p 1 -d all.exon_bw_count.pasted.gz) -s 255 -p gene -h
cat gene.counts | pigz --fast -p 6 > all.gene_counts.rejoined.tsv.gz
cat gene.intron_counts | pigz --fast -p 6 > all.intron_counts.rejoined.tsv.gz
rm -f gene.counts gene.intron_counts
building annotation set done, disjoint2annot map size: 1860460, original annotation map size: 237964
/bin/bash: line 1: 52072 Segmentation fault /recount-unify/rejoin/rejoin -a /container-mounts/ref/disjoint2exons2genes.bed -d <(pigz --stdout -p 1 -d all.exon_bw_count.pasted.gz) -s 255 -p gene -h
[Mon May 16 13:29:26 2022]
Error in rule rejoin_genes:
jobid: 4
output: all.gene_counts.rejoined.tsv.gz, all.intron_counts.rejoined.tsv.gz
RuleException:
CalledProcessError in line 234 of /recount-unify/Snakefile:
Command ' set -euo pipefail;
/recount-unify/rejoin/rejoin -a /container-mounts/ref/disjoint2exons2genes.bed -d <(pigz --stdout -p 1 -d all.exon_bw_count.pasted.gz) -s 255 -p gene -h
cat gene.counts | pigz --fast -p 6 > all.gene_counts.rejoined.tsv.gz
cat gene.intron_counts | pigz --fast -p 6 > all.intron_counts.rejoined.tsv.gz
rm -f gene.counts gene.intron_counts ' returned non-zero exit status 139.
File "/recount-unify/Snakefile", line 234, in __rule_rejoin_genes
File "/opt/conda/envs/recount-unify/lib/python3.9/concurrent/futures/thread.py", line 52, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /container-mounts/working/.snakemake/log/2022-05-16T132622.956208.snakemake.log
I'm getting really discouraged at my inability to quickly diagnose where I'm being stupid.
Hi @davemcg,
I think the first reported issue may be due to a breaking change in the unifier in Feb of this year that requires additional reference-related files:
https://github.com/langmead-lab/monorail-external/commit/646c59124d546da63cbb73356273bb174b2a63ea
try grabbing those additional files into your ref directory for human and re-running the unifier
OK, I updated my ref directory, re-ran all the input pump jobs and now have this error for unify:
+ fgrep -v '##'
+ perl /recount-unify/scripts/check_unifier_outputs.pl gene_sums_per_study/hi/ruchi/bharti.gene_sums.ruchi.SIRV.gz /container-mounts/working/ids.tsv.num_samples_per_study.tsv gene /container-mounts/ref/gene_exon_annotation_row_counts.tsv
+ for f in `find gene_sums_per_study -name "*.gz" -size +0c`
+ pcat gene_sums_per_study/ik/dominik/bharti.gene_sums.dominik.ERCC.gz
+ fgrep -v '##'
+ perl /recount-unify/scripts/check_unifier_outputs.pl gene_sums_per_study/ik/dominik/bharti.gene_sums.dominik.ERCC.gz /container-mounts/working/ids.tsv.num_samples_per_study.tsv gene /container-mounts/ref/gene_exon_annotation_row_counts.tsv
ERROR expected column count:24 != column count:23 line#1 gene_id DR02_H25TMDSX2_19148168_S36_L002 DR03_H25TMDSX2_19148170_S48_L002 DR04_H25TMDSX2_19148172_S44_L002 DR05_H25TMDSX2_19148174_S35_L002 DR06_H25TMDSX2_19148176_S46_L002 DR07_H25TMDSX2_19148178_S43_L002 DR08_H
25TMDSX2_19148180_S37_L002 DR09_H25TMDSX2_19148182_S42_L002 DR10_H25TMDSX2_19148184_S31_L002 DR11_H25TMDSX2_19148186_S25_L002 DR12_H25TMDSX2_19148188_S23_L002 DR13_H25TMDSX2_19148190_S33_L002 DR14_H25TMDSX2_19148192_S40_L002 DR15_H25TMDSX2_19148194_S29_L002 DR16
_H25TMDSX2_19148196_S24_L002 DR17_H25TMDSX2_19148156_S22_L002 DR18_H25TMDSX2_19148158_S21_L002 DR19_H25TMDSX2_19148160_S38_L002 DR20_H25TMDSX2_19148162_S27_L002 DR21_H25TMDSX2_19148164_S34_L002 DR22_H25TMDSX2_19148198_S26_L002 DR23_H25TMDSX2_19148200_S32_L002 DR
24_H25TMDSX2_19148202_S28_L002
ERROR expected column count:24 != column count:23 line#2 ERCC-00002 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ERROR expected column count:24 != column count:23 line#3 ERCC-00003 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ERROR expected column count:24 != column count:23 line#4 ERCC-00004 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ERROR expected column count:24 != column count:23 line#5 ERCC-00009 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ERROR expected column count:24 != column count:23 line#6 ERCC-00012 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ERROR expected column count:24 != column count:23 line#7 ERCC-00013 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ERROR expected column count:24 != column count:23 line#8 ERCC-00014 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ERROR expected column count:24 != column count:23 line#9 ERCC-00016 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ERROR expected column count:24 != column count:23 line#10 ERCC-00017 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ERROR expected column count:24 != column count:23 line#11 ERCC-00019 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ERROR expected column count:24 != column count:23 line#12 ERCC-00022 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ERROR expected column count:24 != column count:23 line#13 ERCC-00024 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ERROR expected column count:24 != column count:23 line#14 ERCC-00025 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ERROR expected column count:24 != column count:23 line#15 ERCC-00028 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ERROR expected column count:24 != column count:23 line#16 ERCC-00031 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ERROR expected column count:24 != column count:23 line#17 ERCC-00033 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ERROR expected column count:24 != column count:23 line#18 ERCC-00034 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
....
ERROR expected column count:24 != column count:23 line#89 ERCC-00164 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ERROR expected column count:24 != column count:23 line#90 ERCC-00165 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ERROR expected column count:24 != column count:23 line#91 ERCC-00168 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ERROR expected column count:24 != column count:23 line#92 ERCC-00170 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ERROR expected column count:24 != column count:23 line#93 ERCC-00171 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Hmm, this one is interpretable....somehow an rsync didn't transfer stuffs....
(realized there is supposed to be a DR01_\w+
sample and the pump folder is missing a few unique.gene_fc_count
files)
Ah yes, so now unify has finished.
tldr:
git pull
on my monorail git cloneFortunately this project was small enough to YOLO and re-do pump. If I had several thousands samples to pump
then I would have been a bit more persistent in figuring out what was going wrong.
Using
recount-unify_1.1.0.sif
andrecount-rs5_1.0.6.sif
.Not certain how to diagnose this...where should I be looking?