langmead-lab / monorail-external

examples to run monorail externally
MIT License
13 stars 5 forks source link

Unifier throwing an error at the end #29

Open pabloacera opened 7 months ago

pabloacera commented 7 months ago

Hi,

I am running unifier with junction counts only. It seems that everything goes find, and all the snakemake jobs run without a problem but right at the end there is an error. I can see all the outputs in the folder, but I am not sure if I am missing something or it is just the clean up part where the issue pops. Here is the command I used and the error. Thanks a lot!!

export SKIP_SUMS=1 && sudo -E /bin/bash ./singularity/run_recount_unify.sh /mnt/data/paceramateos/monorail-external-master/recount-unify_1.1.1.sif hg38 /mnt/data/paceramateos/monorail-external-master /mnt/data/paceramateos/monorail-external-master/test_SMC_output /mnt/data/paceramateos/monorail-external-master/output /mnt/data/paceramateos/monorail-external-master/metadata.tsv 2 test_SMC_output:101


[Sun Feb 11 09:26:19 2024]
Finished job 0.
409 of 409 steps (100%) done
Complete log: /container-mounts/working/.snakemake/log/2024-02-11T091549.271601.snakemake.log
++ fgrep 'steps (100%) done' recount-unify.output.jxs.txt
+ done='407 of 409 steps (100%) done
408 of 409 steps (100%) done
409 of 409 steps (100%) done'
+ [[ -z 407 of 409 steps (100%) done
408 of 409 steps (100%) done
409 of 409 steps (100%) done ]]
+ [[ ! -z 1 ]]
+ mkdir -p temp_jxs
+ mv junction_counts_per_study/02 junction_counts_per_study/05 junction_counts_per_study/08 junction_counts_per_study/09 junction_counts_per_study/10 junction_counts_per_study/11 junction_counts_per_study/12 junction_counts_per_study/13 junction_counts_per_study/25 junction_counts_per_study/32 junction_counts_per_study/34 junction_counts_per_study/56 junction_counts_per_study/57 junction_counts_per_study/58 junction_counts_per_study/59 junction_counts_per_study/60 junction_counts_per_study/61 junction_counts_per_study/62 junction_counts_per_study/64 junction_counts_per_study/67 junction_counts_per_study/70 junction_counts_per_study/73 junction_counts_per_study/76 junction_counts_per_study/79 junction_counts_per_study/82 junction_counts_per_study/85 temp_jxs/
+ mv junction_counts_per_study junction_counts_per_study_run_files
+ mv temp_jxs junction_counts_per_study
+ num_expected=0
++ find junction_counts_per_study -name '*.gz' -size +0c
++ wc -l
+ num_jx_files=162
+ [[ 0 -ne 162 ]]
+ echo 'FAILURE running gene/exon unify, unexpected # of gene sum files: 162 vs. 0 (expected)'
FAILURE running gene/exon unify, unexpected # of gene sum files: 162 vs. 0 (expected)
+ exit -1
ChristopherWilks commented 7 months ago

thanks for the report @pabloacera. This looks like an inconsistency introduced by additional checks I added a while after I supported the SKIP_SUMS flag, I'll take a look further and see what the fix should be.

pabloacera commented 7 months ago

Alright thanks a lot! please let me know any updates.

ChristopherWilks commented 7 months ago

ok, @pabloacera please try this quick fix update to the unifier image 1.1.2rc:

https://quay.io/repository/broadsword/recount-unify?tab=tags

pabloacera commented 7 months ago

Hi,

Thanks for the quick response. I got a slightly different error at the end as well:

command:

export SKIP_SUMS=1 && sudo -E /bin/bash ./singularity/run_recount_unify.sh /mnt/data/paceramateos/monorail-external-master/recount-unify_1.1.2rc.sif hg38 /mnt/data/paceramateos/monorail-external-master /mnt/data/paceramateos/monorail-external-master/unify_SMC_out/ /mnt/data/paceramateos/monorail-external-master/output /mnt/data/paceramateos/monorail-external-master/metadata.tsv 2 unify_SMC_out:101


[Mon Feb 12 06:26:59 2024]
Finished job 0.
409 of 409 steps (100%) done
Complete log: /container-mounts/working/.snakemake/log/2024-02-12T061619.075216.snakemake.log
++ fgrep 'steps (100%) done' recount-unify.output.jxs.txt
+ done='407 of 409 steps (100%) done
408 of 409 steps (100%) done
409 of 409 steps (100%) done'
+ [[ -z 407 of 409 steps (100%) done
408 of 409 steps (100%) done
409 of 409 steps (100%) done ]]
+ [[ ! -z 1 ]]
+ mkdir -p temp_jxs
+ mv junction_counts_per_study/02 junction_counts_per_study/05 junction_counts_per_study/08 junction_counts_per_study/09 junction_counts_per_study/10 junction_counts_per_study/11 junction_counts_per_study/12 junction_counts_per_study/13 junction_counts_per_study/25 junction_counts_per_study/32 junction_counts_per_study/34 junction_counts_per_study/56 junction_counts_per_study/57 junction_counts_per_study/58 junction_counts_per_study/59 junction_counts_per_study/60 junction_counts_per_study/61 junction_counts_per_study/62 junction_counts_per_study/64 junction_counts_per_study/67 junction_counts_per_study/70 junction_counts_per_study/73 junction_counts_per_study/76 junction_counts_per_study/79 junction_counts_per_study/82 junction_counts_per_study/85 temp_jxs/
+ mv junction_counts_per_study junction_counts_per_study_run_files
+ mv temp_jxs junction_counts_per_study
+ num_expected=162
++ find junction_counts_per_study -name '*.gz' -size +0c
++ wc -l
+ num_jx_files=162
+ [[ 162 -ne 162 ]]
+ cat qc_1.tsv
+ perl /recount-unify/log_qc/add_jx_stats2qc.pl samples.tsv
cat: qc_1.tsv: No such file or directory

These are the output files in the output folder, just in case it helps

(base) paceramateos@ausy3presana01:/mnt/data/paceramateos/monorail-external-master/unify_SMC_out$ ll
total 3001876
drwxr-xr-x 10 root root     12288 Feb 12 06:26 ./
drwxr-xr-x 20 root root      4096 Feb 12 06:08 ../
-rw-r--r--  1 root root 102152995 Feb 12 06:26 all.sjs.motifs.merged.tsv
-rw-r--r--  1 root root         0 Feb 12 06:16 assign_compilation_ids.py.errs
-rw-r--r--  1 root root   4316278 Feb 12 06:16 blank_exon_sums
-rw-r--r--  1 root root       648 Feb 12 06:16 ids.input
-rw-r--r--  1 root root       405 Feb 12 06:16 ids.input.group_counters
-rw-r--r--  1 root root       837 Feb 12 06:16 ids.tsv
-rw-r--r--  1 root root        28 Feb 12 06:26 ids.tsv.new_header
-rw-r--r--  1 root root       378 Feb 12 06:16 ids.tsv.num_samples_per_study.tsv
-rw-r--r--  1 root root       324 Feb 12 06:16 ids.tsv.studies
drwxr-xr-x 28 root root      4096 Feb 12 06:16 input_from_pump/
drwxr-xr-x 28 root root      4096 Feb 12 06:26 junction_counts_per_study/
drwxr-xr-x  2 root root     32768 Feb 12 06:26 junction_counts_per_study_run_files/
-rw-r--r--  1 root root  38522177 Feb 12 06:26 junctions.bgz
-rw-r--r--  1 root root    240231 Feb 12 06:26 junctions.bgz.tbi
-rw-r--r--  1 root root 287346688 Feb 12 06:26 junctions.sqlite
prw-r--r--  1 root root         0 Feb 12 06:26 jx_sqlite_import|
-rw-r--r--  1 root root      1130 Feb 12 06:26 jx_stats_per_sample.tsv
drwxr-xr-x 28 root root      4096 Feb 12 06:16 links/
drwxr-xr-x  2 root root      4096 Feb 12 06:26 lucene_full_standard/
drwxr-xr-x  2 root root      4096 Feb 12 06:26 lucene_full_ws/
-rw-r--r--  1 root root       101 Feb 12 06:26 lucene_indexed_numeric_types.tsv
-rw-r--r--  1 root root      8789 Feb 12 06:26 lucene.indexer.run
-rw-r--r--  1 root root       668 Feb 11 09:02 metadata.tsv
-rw-r--r--  1 root root         0 Feb 12 06:26 qc_2.tsv
-rw-r--r--  1 root root         0 Feb 12 06:26 qc.err
-rw-r--r--  1 root root    252098 Feb 12 06:26 recount-unify.jxs.stats.json
-rw-r--r--  1 root root    325020 Feb 12 06:26 recount-unify.output.jxs.txt
-rw-r--r--  1 root root        95 Feb 12 06:26 samples.fields.tsv
-rw-r--r--  1 root root      1798 Feb 12 06:26 samples.tsv
-rw-r--r--  1 root root        58 Feb 12 06:26 samples.tsv.inferred
-rw-r--r--  1 root root      1533 Feb 12 06:16 setup_links.run
drwxr-xr-x  9 root root      4096 Feb 12 06:16 .snakemake/
-rw-r--r--  1 root root       837 Feb 12 06:26 sorted_samples.tsv
-rw-r--r--  1 root root   2521489 Feb 12 06:19 SRR11085164.all.mm
-rw-r--r--  1 root root  43443959 Feb 12 06:19 SRR11085164.all.RR
-rw-r--r--  1 root root   2516681 Feb 12 06:18 SRR11085164.unique.mm
-rw-r--r--  1 root root  43443959 Feb 12 06:18 SRR11085164.unique.RR
-rw-r--r--  1 root root   2476586 Feb 12 06:17 SRR11085167.all.mm
-rw-r--r--  1 root root  42568172 Feb 12 06:17 SRR11085167.all.RR
-rw-r--r--  1 root root   2472201 Feb 12 06:19 SRR11085167.unique.mm
-rw-r--r--  1 root root  42568172 Feb 12 06:19 SRR11085167.unique.RR
-rw-r--r--  1 root root   3990150 Feb 12 06:17 SRR11085170.all.mm
-rw-r--r--  1 root root  61038154 Feb 12 06:17 SRR11085170.all.RR
-rw-r--r--  1 root root   3981491 Feb 12 06:18 SRR11085170.unique.mm
-rw-r--r--  1 root root  61038154 Feb 12 06:18 SRR11085170.unique.RR
-rw-r--r--  1 root root      9254 Feb 12 06:26 SRR11085173.all.mm
-rw-r--r--  1 root root    252960 Feb 12 06:26 SRR11085173.all.RR
-rw-r--r--  1 root root      9254 Feb 12 06:26 SRR11085173.unique.mm
-rw-r--r--  1 root root    252960 Feb 12 06:26 SRR11085173.unique.RR
-rw-r--r--  1 root root   2478331 Feb 12 06:22 SRR11085176.all.mm
-rw-r--r--  1 root root  42782559 Feb 12 06:22 SRR11085176.all.RR
-rw-r--r--  1 root root   2473584 Feb 12 06:22 SRR11085176.unique.mm
-rw-r--r--  1 root root  42782559 Feb 12 06:22 SRR11085176.unique.RR
-rw-r--r--  1 root root   2534572 Feb 12 06:21 SRR11085179.all.mm
-rw-r--r--  1 root root  43292606 Feb 12 06:21 SRR11085179.all.RR
-rw-r--r--  1 root root   2530012 Feb 12 06:21 SRR11085179.unique.mm
-rw-r--r--  1 root root  43292606 Feb 12 06:21 SRR11085179.unique.RR
-rw-r--r--  1 root root   2489279 Feb 12 06:23 SRR11085182.all.mm
-rw-r--r--  1 root root  43099373 Feb 12 06:23 SRR11085182.all.RR
-rw-r--r--  1 root root   2484557 Feb 12 06:23 SRR11085182.unique.mm
-rw-r--r--  1 root root  43099373 Feb 12 06:23 SRR11085182.unique.RR
-rw-r--r--  1 root root   2463244 Feb 12 06:25 SRR11085185.all.mm
-rw-r--r--  1 root root  42547589 Feb 12 06:25 SRR11085185.all.RR
-rw-r--r--  1 root root   2458856 Feb 12 06:24 SRR11085185.unique.mm
-rw-r--r--  1 root root  42547589 Feb 12 06:24 SRR11085185.unique.RR
-rw-r--r--  1 root root   2280029 Feb 12 06:19 SRR12765356.all.mm
-rw-r--r--  1 root root  40502777 Feb 12 06:19 SRR12765356.all.RR
-rw-r--r--  1 root root   2274507 Feb 12 06:18 SRR12765356.unique.mm
-rw-r--r--  1 root root  40502777 Feb 12 06:18 SRR12765356.unique.RR
-rw-r--r--  1 root root   2158840 Feb 12 06:20 SRR12765361.all.mm
-rw-r--r--  1 root root  38748336 Feb 12 06:20 SRR12765361.all.RR
-rw-r--r--  1 root root   2153779 Feb 12 06:20 SRR12765361.unique.mm
-rw-r--r--  1 root root  38748336 Feb 12 06:20 SRR12765361.unique.RR
-rw-r--r--  1 root root   3125219 Feb 12 06:22 SRR13209902.all.mm
-rw-r--r--  1 root root  50752180 Feb 12 06:22 SRR13209902.all.RR
-rw-r--r--  1 root root   3120832 Feb 12 06:22 SRR13209902.unique.mm
-rw-r--r--  1 root root  50752180 Feb 12 06:22 SRR13209902.unique.RR
-rw-r--r--  1 root root   3273711 Feb 12 06:20 SRR13209905.all.mm
-rw-r--r--  1 root root  52575472 Feb 12 06:20 SRR13209905.all.RR
-rw-r--r--  1 root root   3268902 Feb 12 06:20 SRR13209905.unique.mm
-rw-r--r--  1 root root  52575472 Feb 12 06:20 SRR13209905.unique.RR
-rw-r--r--  1 root root   3447740 Feb 12 06:23 SRR13209908.all.mm
-rw-r--r--  1 root root  54653228 Feb 12 06:23 SRR13209908.all.RR
-rw-r--r--  1 root root   3442726 Feb 12 06:22 SRR13209908.unique.mm
-rw-r--r--  1 root root  54653228 Feb 12 06:22 SRR13209908.unique.RR
-rw-r--r--  1 root root   3356915 Feb 12 06:19 SRR13209909.all.mm
-rw-r--r--  1 root root  53503805 Feb 12 06:19 SRR13209909.all.RR
-rw-r--r--  1 root root   3352033 Feb 12 06:20 SRR13209909.unique.mm
-rw-r--r--  1 root root  53503805 Feb 12 06:20 SRR13209909.unique.RR
-rw-r--r--  1 root root   2897777 Feb 12 06:19 SRR13209910.all.mm
-rw-r--r--  1 root root  48977776 Feb 12 06:19 SRR13209910.all.RR
-rw-r--r--  1 root root   2893954 Feb 12 06:19 SRR13209910.unique.mm
-rw-r--r--  1 root root  48977776 Feb 12 06:19 SRR13209910.unique.RR
-rw-r--r--  1 root root   3207252 Feb 12 06:24 SRR13209911.all.mm
-rw-r--r--  1 root root  52763266 Feb 12 06:24 SRR13209911.all.RR
-rw-r--r--  1 root root   3202924 Feb 12 06:25 SRR13209911.unique.mm
-rw-r--r--  1 root root  52763266 Feb 12 06:25 SRR13209911.unique.RR
-rw-r--r--  1 root root   4637836 Feb 12 06:21 SRR13209912.all.mm
-rw-r--r--  1 root root  70401722 Feb 12 06:21 SRR13209912.all.RR
-rw-r--r--  1 root root   4632239 Feb 12 06:21 SRR13209912.unique.mm
-rw-r--r--  1 root root  70401722 Feb 12 06:21 SRR13209912.unique.RR
-rw-r--r--  1 root root   4495647 Feb 12 06:24 SRR13209913.all.mm
-rw-r--r--  1 root root  68532094 Feb 12 06:24 SRR13209913.all.RR
-rw-r--r--  1 root root   4490317 Feb 12 06:23 SRR13209913.unique.mm
-rw-r--r--  1 root root  68532094 Feb 12 06:23 SRR13209913.unique.RR
-rw-r--r--  1 root root   2494118 Feb 12 06:25 SRR21607157.all.mm
-rw-r--r--  1 root root  42828910 Feb 12 06:25 SRR21607157.all.RR
-rw-r--r--  1 root root   2489096 Feb 12 06:24 SRR21607157.unique.mm
-rw-r--r--  1 root root  42828910 Feb 12 06:24 SRR21607157.unique.RR
-rw-r--r--  1 root root   2228605 Feb 12 06:21 SRR21607158.all.mm
-rw-r--r--  1 root root  39339541 Feb 12 06:21 SRR21607158.all.RR
-rw-r--r--  1 root root   2224411 Feb 12 06:21 SRR21607158.unique.mm
-rw-r--r--  1 root root  39339541 Feb 12 06:21 SRR21607158.unique.RR
-rw-r--r--  1 root root   2287170 Feb 12 06:23 SRR21607159.all.mm
-rw-r--r--  1 root root  40170022 Feb 12 06:23 SRR21607159.all.RR
-rw-r--r--  1 root root   2282867 Feb 12 06:23 SRR21607159.unique.mm
-rw-r--r--  1 root root  40170022 Feb 12 06:23 SRR21607159.unique.RR
-rw-r--r--  1 root root   2105274 Feb 12 06:17 SRR21607160.all.mm
-rw-r--r--  1 root root  38016996 Feb 12 06:17 SRR21607160.all.RR
-rw-r--r--  1 root root   2101079 Feb 12 06:17 SRR21607160.unique.mm
-rw-r--r--  1 root root  38016996 Feb 12 06:17 SRR21607160.unique.RR
-rw-r--r--  1 root root   1946402 Feb 12 06:24 SRR21607161.all.mm
-rw-r--r--  1 root root  35711495 Feb 12 06:24 SRR21607161.all.RR
-rw-r--r--  1 root root   1942727 Feb 12 06:25 SRR21607161.unique.mm
-rw-r--r--  1 root root  35711495 Feb 12 06:25 SRR21607161.unique.RR
-rw-r--r--  1 root root   2024157 Feb 12 06:16 SRR21607162.all.mm
-rw-r--r--  1 root root  36911131 Feb 12 06:16 SRR21607162.all.RR
-rw-r--r--  1 root root   2020296 Feb 12 06:16 SRR21607162.unique.mm
-rw-r--r--  1 root root  36911131 Feb 12 06:16 SRR21607162.unique.RR
-rw-r--r--  1 root root   3663563 Feb 12 06:17 SRR22909625.all.mm
-rw-r--r--  1 root root  55704549 Feb 12 06:17 SRR22909625.all.RR
-rw-r--r--  1 root root   3657287 Feb 12 06:18 SRR22909625.unique.mm
-rw-r--r--  1 root root  55704549 Feb 12 06:18 SRR22909625.unique.RR
-rw-r--r--  1 root root   3382954 Feb 12 06:25 SRR22909632.all.mm
-rw-r--r--  1 root root  52737201 Feb 12 06:25 SRR22909632.all.RR
-rw-r--r--  1 root root   3377004 Feb 12 06:25 SRR22909632.unique.mm
-rw-r--r--  1 root root  52737201 Feb 12 06:25 SRR22909632.unique.RR
-rw-r--r--  1 root root   3425936 Feb 12 06:18 SRR22909634.all.mm
-rw-r--r--  1 root root  52876009 Feb 12 06:18 SRR22909634.all.RR
-rw-r--r--  1 root root   3419784 Feb 12 06:17 SRR22909634.unique.mm
-rw-r--r--  1 root root  52876009 Feb 12 06:17 SRR22909634.unique.RR
drwxr-xr-x  2 root root     36864 Feb 12 06:26 staging_jxs/

A separate question is, Do you think I can still use the outputs? they seems to be all there. Is all the computation done? Thanks!

ChristopherWilks commented 7 months ago

yeah, that's not too surprising, the qc file(s) it's referencing are generated as part of the sums area which gets skipped so I need to do some more work on that to make it consistent. To your 2nd question, yes, the data files themselves are fine, though not named exactly as recount3 expects them. You'd at least want to run these two find commands to properly rename: https://github.com/langmead-lab/recount-unify/blob/e00439ed677e262701fad2e011300c4b5763c545/workflow.bash#L322

That said, can you remind me what your overall goal is with just running the jxns (are you just interested in recount3-ready jxns OR snaptron jxns, or something else?)?

pabloacera commented 7 months ago

Thanks for the commands! my goal is to generate the snaptron jxns.

ChristopherWilks commented 7 months ago

Hi @pabloacera

Just a followup on your last comment (sorry, been quite busy with other things recently), you really should only need these files from your output above for Snaptron (you don't need the rest which is only for recount3):

-rw-r--r--  1 root root  38522177 Feb 12 06:26 junctions.bgz
-rw-r--r--  1 root root    240231 Feb 12 06:26 junctions.bgz.tbi
-rw-r--r--  1 root root 287346688 Feb 12 06:26 junctions.sqlite
-rw-r--r--  1 root root      1130 Feb 12 06:26 jx_stats_per_sample.tsv
drwxr-xr-x  2 root root      4096 Feb 12 06:26 lucene_full_standard/
drwxr-xr-x  2 root root      4096 Feb 12 06:26 lucene_full_ws/
-rw-r--r--  1 root root       101 Feb 12 06:26 lucene_indexed_numeric_types.tsv
-rw-r--r--  1 root root        95 Feb 12 06:26 samples.fields.tsv
-rw-r--r--  1 root root      1798 Feb 12 06:26 samples.tsv