langmead-lab / monorail-external

examples to run monorail externally
MIT License
13 stars 5 forks source link

Error with SKIP_SUMS=1 in Unifier #6

Closed RamseyKamar closed 2 years ago

RamseyKamar commented 3 years ago

Hi @ChristopherWilks ,

I tried running the Unifier with the exact setup in this issue except that I set the SKIP_SUMS=1 flag (for clarity, I'm using the 1.0.8 docker image which I also eventually used to run the full Unifier in the last issue once I got it working). I get the following error only when setting the SKIP_SUMS=1 flag:

+ echo 'FAILURE running gene/exon unify, unexpected # of gene sum files: 12 vs. 0 (expected)'
FAILURE running gene/exon unify, unexpected # of gene sum files: 12 vs. 0 (expected)
+ exit -1

The error is coming from workflow.bash in this line:

#6 jx files per study (all + unique ID, MM, RR files)
    num_expected=$(( num_studies * 6 ))
    num_jx_files=$(find junction_counts_per_study -name "*.gz" -size +0c | wc -l)
    if [[ $num_expected -ne $num_jx_files ]]; then
        echo "FAILURE running gene/exon unify, unexpected # of gene sum files: $num_jx_files vs. $num_expected (expected)"
        exit -1
    fi

because num_studies is coming out 0 since the line it is defined in is in the if statement corresponding to running the recount sums which doesn't get run when SKIP_SUMS=1 is set:

export num_studies=$(cat ${SAMPLE_ID_MANIFEST}.studies | wc -l)

Note, ${SAMPLE_ID_MANIFEST} should resolve to ids.tsv.studies which I can't find anywhere after the pipeline is finished (I do see it in the run_files when I run the Unifier without the SKIP_SUMS=1 flag).

Here are the contents of the working directory after the pipeline is finished:

(base) kamarra5@<server>: /<path to parent of monorail>/monorail-external-debug-unifier/unifier_output/debug_test_2.1 (rk_get_unifier_working) $ ls
all.sjs.motifs.merged.tsv       DLBCL_relapse.unique.mm   ids.tsv.new_header                   junctions.bgz.tbi        lucene_full_standard              recount-unify.output.jxs.txt  setup_links.run     TRT_study.unique.mm
assign_compilation_ids.py.errs  DLBCL_relapse.unique.RR   input_from_pump                      junctions.sqlite         lucene_full_ws                    sample_metadata.tsv           sorted_samples.tsv  TRT_study.unique.RR
blank_exon_sums                 ids.input                 junction_counts_per_study            jx_sqlite_import         lucene_indexed_numeric_types.tsv  samples.fields.tsv            staging_jxs
DLBCL_relapse.all.mm            ids.input.group_counters  junction_counts_per_study_run_files  jx_stats_per_sample.tsv  lucene.indexer.run                samples.tsv                   TRT_study.all.mm
DLBCL_relapse.all.RR            ids.tsv                   junctions.bgz                        links                    recount-unify.jxs.stats.json      samples.tsv.inferred          TRT_study.all.RR

For completeness, here is my Unifier run command:

export SKIP_SUMS=1 && ./singularity/run_recount_unify.sh <Monorail repo root>/recount-unify_1.0.8.sif hg38 <Monorail repo root> <Monorail repo root>/unifier_output/debug_test_2.1 <Monorail repo root>/to_be_unified/debug_test_2 <Monorail repo root>/manifests/debug_test_2/sample_metadata.tsv 20 nvstest:102

Thanks and regards,

Ramsey

ChristopherWilks commented 2 years ago

this was fixed offline