langmead-lab / monorail-external

examples to run monorail externally
MIT License
13 stars 5 forks source link

"left over gene counts, terminating early!" #17

Closed davemcg closed 2 years ago

davemcg commented 2 years ago
121 of 121 steps (100%) done
Complete log: /container-mounts/working/.snakemake/log/2022-05-15T180517.331697.snakemake.log
++ fgrep 'steps (100%) done' recount-unify.output.sums.txt
+ done='121 of 121 steps (100%) done'
+ [[ -z 121 of 121 steps (100%) done ]]
++ echo G026,G029,R109,F006,ERCC,SIRV
++ sed 's#,# #g'
+ LIST_OF_ANNOTATIONS_SPACES='G026 G029 R109 F006 ERCC SIRV'
+ for a in $LIST_OF_ANNOTATIONS_SPACES
++ find gene_sums_per_study -name '*.G026.gz'
+ for f in `find gene_sums_per_study -name "*.${a}.gz"`
++ pcat gene_sums_per_study/an/aman/metaRPE.gene_sums.aman.G026.gz
++ wc -l
+ wc1=63859
+ output=gene_sums_per_study/an/aman/metaRPE.gene_sums.aman.G026.gz.reordered
+ pigz --fast -p2
+ cat /container-mounts/ref/G026.gene_sums.gene_order.tsv
+ python3 /recount-unify/rejoin/enforce_gene_order.py /dev/fd/63
++ pigz --stdout -p 2 -d gene_sums_per_study/an/aman/metaRPE.gene_sums.aman.G026.gz
cat: /container-mounts/ref/G026.gene_sums.gene_order.tsv: No such file or directory
left over gene counts, terminating early!

Using recount-unify_1.1.0.sif and recount-rs5_1.0.6.sif.

Not certain how to diagnose this...where should I be looking?

davemcg commented 2 years ago

Well I also tried going back to unify 1.0.9 and got a different error about the exon sum files being wrong

+ [[ 48 -ne 54 ]]
+ echo 'FAILURE running unify, unexpected # of exon sum files: 54 vs. 48 (expected)'
FAILURE running unify, unexpected # of exon sum files: 54 vs. 48 (expected)

Which led me to going into the exon_sums_per_study folder and indeed there was another two digit directory that wasn't in gene_sums_per_study. The study name had an underscore and it seemed that the parser was creating the directory from the two letters right before the first _. So I reran those files in pump with a study name (I'm using the $9 explicit study name in pump) without an underscore.

Now I'm getting this error which is the same in 1.0.9 and 1.1.0:

[Mon May 16 13:29:07 2022]
Finished job 1.
111 of 129 steps (86%) done

[Mon May 16 13:29:07 2022]
rule rejoin_genes:
    input: all.exon_bw_count.pasted.gz
    output: all.gene_counts.rejoined.tsv.gz, all.intron_counts.rejoined.tsv.gz
    jobid: 4
    threads: 6

                /recount-unify/rejoin/rejoin -a /container-mounts/ref/disjoint2exons2genes.bed -d <(pigz --stdout -p 1 -d all.exon_bw_count.pasted.gz) -s 255 -p gene -h
                cat gene.counts | pigz --fast -p 6 > all.gene_counts.rejoined.tsv.gz
                cat gene.intron_counts | pigz --fast -p 6 > all.intron_counts.rejoined.tsv.gz
                rm -f gene.counts gene.intron_counts

building annotation set done, disjoint2annot map size: 1860460, original annotation map size: 237964
/bin/bash: line 1: 52072 Segmentation fault      /recount-unify/rejoin/rejoin -a /container-mounts/ref/disjoint2exons2genes.bed -d <(pigz --stdout -p 1 -d all.exon_bw_count.pasted.gz) -s 255 -p gene -h
[Mon May 16 13:29:26 2022]
Error in rule rejoin_genes:
    jobid: 4
    output: all.gene_counts.rejoined.tsv.gz, all.intron_counts.rejoined.tsv.gz

RuleException:
CalledProcessError in line 234 of /recount-unify/Snakefile:
Command ' set -euo pipefail;
                /recount-unify/rejoin/rejoin -a /container-mounts/ref/disjoint2exons2genes.bed -d <(pigz --stdout -p 1 -d all.exon_bw_count.pasted.gz) -s 255 -p gene -h
                cat gene.counts | pigz --fast -p 6 > all.gene_counts.rejoined.tsv.gz
                cat gene.intron_counts | pigz --fast -p 6 > all.intron_counts.rejoined.tsv.gz
                rm -f gene.counts gene.intron_counts ' returned non-zero exit status 139.
  File "/recount-unify/Snakefile", line 234, in __rule_rejoin_genes
  File "/opt/conda/envs/recount-unify/lib/python3.9/concurrent/futures/thread.py", line 52, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /container-mounts/working/.snakemake/log/2022-05-16T132622.956208.snakemake.log

I'm getting really discouraged at my inability to quickly diagnose where I'm being stupid.

nmra-cwilks commented 2 years ago

Hi @davemcg,

I think the first reported issue may be due to a breaking change in the unifier in Feb of this year that requires additional reference-related files:

https://github.com/langmead-lab/monorail-external/commit/646c59124d546da63cbb73356273bb174b2a63ea

try grabbing those additional files into your ref directory for human and re-running the unifier

davemcg commented 2 years ago

OK, I updated my ref directory, re-ran all the input pump jobs and now have this error for unify:

+ fgrep -v '##'
+ perl /recount-unify/scripts/check_unifier_outputs.pl gene_sums_per_study/hi/ruchi/bharti.gene_sums.ruchi.SIRV.gz /container-mounts/working/ids.tsv.num_samples_per_study.tsv gene /container-mounts/ref/gene_exon_annotation_row_counts.tsv
+ for f in `find gene_sums_per_study -name "*.gz" -size +0c`
+ pcat gene_sums_per_study/ik/dominik/bharti.gene_sums.dominik.ERCC.gz
+ fgrep -v '##'
+ perl /recount-unify/scripts/check_unifier_outputs.pl gene_sums_per_study/ik/dominik/bharti.gene_sums.dominik.ERCC.gz /container-mounts/working/ids.tsv.num_samples_per_study.tsv gene /container-mounts/ref/gene_exon_annotation_row_counts.tsv
ERROR   expected column count:24 != column count:23     line#1  gene_id DR02_H25TMDSX2_19148168_S36_L002        DR03_H25TMDSX2_19148170_S48_L002        DR04_H25TMDSX2_19148172_S44_L002        DR05_H25TMDSX2_19148174_S35_L002        DR06_H25TMDSX2_19148176_S46_L002        DR07_H25TMDSX2_19148178_S43_L002        DR08_H
25TMDSX2_19148180_S37_L002        DR09_H25TMDSX2_19148182_S42_L002        DR10_H25TMDSX2_19148184_S31_L002        DR11_H25TMDSX2_19148186_S25_L002        DR12_H25TMDSX2_19148188_S23_L002        DR13_H25TMDSX2_19148190_S33_L002        DR14_H25TMDSX2_19148192_S40_L002        DR15_H25TMDSX2_19148194_S29_L002        DR16
_H25TMDSX2_19148196_S24_L002        DR17_H25TMDSX2_19148156_S22_L002        DR18_H25TMDSX2_19148158_S21_L002        DR19_H25TMDSX2_19148160_S38_L002        DR20_H25TMDSX2_19148162_S27_L002        DR21_H25TMDSX2_19148164_S34_L002        DR22_H25TMDSX2_19148198_S26_L002        DR23_H25TMDSX2_19148200_S32_L002        DR
24_H25TMDSX2_19148202_S28_L002
ERROR   expected column count:24 != column count:23     line#2  ERCC-00002      0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0
ERROR   expected column count:24 != column count:23     line#3  ERCC-00003      0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0
ERROR   expected column count:24 != column count:23     line#4  ERCC-00004      0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0
ERROR   expected column count:24 != column count:23     line#5  ERCC-00009      0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0
ERROR   expected column count:24 != column count:23     line#6  ERCC-00012      0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0
ERROR   expected column count:24 != column count:23     line#7  ERCC-00013      0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0
ERROR   expected column count:24 != column count:23     line#8  ERCC-00014      0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0
ERROR   expected column count:24 != column count:23     line#9  ERCC-00016      0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0
ERROR   expected column count:24 != column count:23     line#10 ERCC-00017      0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0
ERROR   expected column count:24 != column count:23     line#11 ERCC-00019      0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0
ERROR   expected column count:24 != column count:23     line#12 ERCC-00022      0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0
ERROR   expected column count:24 != column count:23     line#13 ERCC-00024      0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0
ERROR   expected column count:24 != column count:23     line#14 ERCC-00025      0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0
ERROR   expected column count:24 != column count:23     line#15 ERCC-00028      0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0
ERROR   expected column count:24 != column count:23     line#16 ERCC-00031      0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0
ERROR   expected column count:24 != column count:23     line#17 ERCC-00033      0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0
ERROR   expected column count:24 != column count:23     line#18 ERCC-00034      0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0
....
ERROR   expected column count:24 != column count:23     line#89 ERCC-00164      0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0
ERROR   expected column count:24 != column count:23     line#90 ERCC-00165      0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0
ERROR   expected column count:24 != column count:23     line#91 ERCC-00168      0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0
ERROR   expected column count:24 != column count:23     line#92 ERCC-00170      0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0
ERROR   expected column count:24 != column count:23     line#93 ERCC-00171      0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0
davemcg commented 2 years ago

Hmm, this one is interpretable....somehow an rsync didn't transfer stuffs....

(realized there is supposed to be a DR01_\w+ sample and the pump folder is missing a few unique.gene_fc_count files)

davemcg commented 2 years ago

Ah yes, so now unify has finished.

tldr:

  1. Ran git pull on my monorail git clone
  2. Re ran the hg38 and hg38_unify ref pull scripts
  3. Re did pump
  4. Re did unify (1.10.0)

Fortunately this project was small enough to YOLO and re-do pump. If I had several thousands samples to pump then I would have been a bit more persistent in figuring out what was going wrong.