langmead-lab / monorail-external

examples to run monorail externally
MIT License
13 stars 5 forks source link

Unifier not running #27

Open lagillenwater opened 10 months ago

lagillenwater commented 10 months ago

I've been trying to run unifier but am not getting any output. Please see script and output below. I can send the full output if that would be helpful. Any idea what the problem is here?

Code: /scratch/alpine/lgillenwater@xsede.org/monorail-external/singularity/run_recount_unify.sh \ /scratch/alpine/lgillenwater@xsede.org/monorail-external/htp/unifier/recount-unify_1.1.0.sif \ hg38 \ /scratch/alpine/lgillenwater@xsede.org/monorail-external/references \ /scratch/alpine/lgillenwater@xsede.org/monorail-external/htp/unifier \ /scratch/alpine/lgillenwater@xsede.org/monorail-external/htp/output \ /scratch/alpine/lgillenwater@xsede.org/monorail-external/htp/sample_metadata.tsv \ 20\ htp:110

Last lines of the output:

ChristopherWilks commented 10 months ago

Hi @lagillenwater,

just so we're on the same version, could you try re-running with the latest unifier image, v1.1.1: https://quay.io/repository/broadsword/recount-unify?tab=tags

It likely won't solve the problem, but helps to be on the latest for debugging.

After the re-run, could you post the following (here):

1) the first few lines of your sample_metadata.tsv file

2) an ls -l of one of your sample's recount-pump output directories, e.g. ls -l /scratch/alpine/lgillenwater@xsede.org/monorail-external/htp/output/<sample_id>_att0/*

Thanks, Chris

lagillenwater commented 10 months ago

@ChristopherWilks, thanks for the quick response.

I've tried running with both 1.1.1 and 1.1.0 with the same result.

Here's the sample metadata:
study_id sample_id SRP349148 SRR17119296 SRP349148 SRR17119297

Here's ls -l of one of the sample directories:

-rw------- 1 lgillenwater@x lgillenw 6671307 2023-08-24 12:55 htp/output/SRR17119296_att0/SRR17119296!SRP349148!hg38!local.Chimeric.out.junction.zst

-rw------- 1 lgillenwater@x lgillenw 44228724 2023-08-24 12:55 htp/output/SRR17119296_att0/SRR17119296!SRP349148!hg38!local.Chimeric.out.sam.zst -rw------- 1 lgillenwater@x lgillenw 27810 2023-08-24 12:55 htp/output/SRR17119296_att0/SRR17119296!SRP349148!hg38!local.align.log -rw------- 1 lgillenwater@x lgillenw 1072 2023-08-24 13:01 htp/output/SRR17119296_att0/SRR17119296!SRP349148!hg38!local.align_unmapped.log -rw------- 1 lgillenwater@x lgillenw 126173796 2023-08-24 13:01 htp/output/SRR17119296_att0/SRR17119296!SRP349148!hg38!local.all.bw -rw------- 1 lgillenwater@x lgillenw 17111110 2023-08-24 13:01 htp/output/SRR17119296_att0/SRR17119296!SRP349148!hg38!local.all.exon_bw_count.zst -rw------- 1 lgillenwater@x lgillenw 384 2023-08-24 12:59 htp/output/SRR17119296_att0/SRR17119296!SRP349148!hg38!local.all.exon_fc_count.summary -rw------- 1 lgillenwater@x lgillenw 5814129 2023-08-24 12:59 htp/output/SRR17119296_att0/SRR17119296!SRP349148!hg38!local.all.exon_fc_count.zst -rw------- 1 lgillenwater@x lgillenw 390 2023-08-24 12:55 htp/output/SRR17119296_att0/SRR17119296!SRP349148!hg38!local.all.gene_fc_count.summary -rw------- 1 lgillenwater@x lgillenw 3827105 2023-08-24 12:55 htp/output/SRR17119296_att0/SRR17119296!SRP349148!hg38!local.all.gene_fc_count.zst -rw------- 1 lgillenwater@x lgillenw 2255 2023-08-24 13:01 htp/output/SRR17119296_att0/SRR17119296!SRP349148!hg38!local.bamcount.log -rw------- 1 lgillenwater@x lgillenw 146 2023-08-24 13:01 htp/output/SRR17119296_att0/SRR17119296!SRP349148!hg38!local.bamcount_auc.tsv -rw------- 1 lgillenwater@x lgillenw 642932 2023-08-24 13:01 htp/output/SRR17119296_att0/SRR17119296!SRP349148!hg38!local.bamcount_frag.tsv -rw------- 1 lgillenwater@x lgillenw 101201891 2023-08-24 13:01 htp/output/SRR17119296_att0/SRR17119296!SRP349148!hg38!local.bamcount_jx.tsv.zst -rw------- 1 lgillenwater@x lgillenw 79732380 2023-08-24 13:01 htp/output/SRR17119296_att0/SRR17119296!SRP349148!hg38!local.bamcount_nonref.csv.zst -rw------- 1 lgillenwater@x lgillenw 772 2023-08-24 13:01 htp/output/SRR17119296_att0/SRR17119296!SRP349148!hg38!local.bamcount_unmapped.log -rw------- 1 lgillenwater@x lgillenw 13 2023-08-24 13:01 htp/output/SRR17119296_att0/SRR17119296!SRP349148!hg38!local.bamcount_unmapped_jx.tsv.zst -rw------- 1 lgillenwater@x lgillenw 780 2023-08-24 13:01 htp/output/SRR17119296_att0/SRR17119296!SRP349148!hg38!local.bamcount_unmapped_nonref.csv.zst -rw------- 1 lgillenwater@x lgillenw 0 2023-08-24 12:50 htp/output/SRR17119296_att0/SRR17119296!SRP349148!hg38!local.download.log -rw------- 1 lgillenwater@x lgillenw 4612 2023-08-24 12:59 htp/output/SRR17119296_att0/SRR17119296!SRP349148!hg38!local.exon_fc_count_all.log -rw------- 1 lgillenwater@x lgillenw 4618 2023-08-24 12:59 htp/output/SRR17119296_att0/SRR17119296!SRP349148!hg38!local.exon_fc_count_unique.log -rw------- 1 lgillenwater@x lgillenw 576 2023-08-24 13:00 htp/output/SRR17119296_att0/SRR17119296!SRP349148!hg38!local.extract_jx.log -rw------- 1 lgillenwater@x lgillenw 0 2023-08-24 12:50 htp/output/SRR17119296_att0/SRR17119296!SRP349148!hg38!local.fastq_check.log -rw------- 1 lgillenwater@x lgillenw 20576 2023-08-24 12:51 htp/output/SRR17119296_att0/SRR17119296!SRP349148!hg38!local.fastq_check.tsv -rw------- 1 lgillenwater@x lgillenw 0 2023-08-24 13:01 htp/output/SRR17119296_att0/SRR17119296!SRP349148!hg38!local.fastq_removal.done -rw------- 1 lgillenwater@x lgillenw 4693 2023-08-24 12:55 htp/output/SRR17119296_att0/SRR17119296!SRP349148!hg38!local.gene_fc_count_all.log -rw------- 1 lgillenwater@x lgillenw 4699 2023-08-24 12:55 htp/output/SRR17119296_att0/SRR17119296!SRP349148!hg38!local.gene_fc_count_unique.log -rw------- 1 lgillenwater@x lgillenw 7418 2023-08-24 12:58 htp/output/SRR17119296_att0/SRR17119296!SRP349148!hg38!local.idxstats -rw------- 1 lgillenwater@x lgillenw 9406377 2023-08-24 13:00 htp/output/SRR17119296_att0/SRR17119296!SRP349148!hg38!local.jx_bed.zst -rw------- 1 lgillenwater@x lgillenw 2629 2023-08-24 13:01 htp/output/SRR17119296_att0/SRR17119296!SRP349148!hg38!local.manifest -rw------- 1 lgillenwater@x lgillenw 673 2023-08-24 13:01 htp/output/SRR17119296_att0/SRR17119296!SRP349148!hg38!local.salmon.log -rw------- 1 lgillenwater@x lgillenw 2472178 2023-08-24 13:01 htp/output/SRR17119296_att0/SRR17119296!SRP349148!hg38!local.salmon.tsv.zst -rw------- 1 lgillenwater@x lgillenw 0 2023-08-24 13:01 htp/output/SRR17119296_att0/SRR17119296!SRP349148!hg38!local.salmon_split3.tsv.zst -rw------- 1 lgillenwater@x lgillenw 2962889 2023-08-24 12:55 htp/output/SRR17119296_att0/SRR17119296!SRP349148!hg38!local.sjout.zst -rw------- 1 lgillenwater@x lgillenw 65 2023-08-24 12:56 htp/output/SRR17119296_att0/SRR17119296!SRP349148!hg38!local.sort.log -rw------- 1 lgillenwater@x lgillenw 123037899 2023-08-24 13:01 htp/output/SRR17119296_att0/SRR17119296!SRP349148!hg38!local.unique.bw -rw------- 1 lgillenwater@x lgillenw 17083048 2023-08-24 13:01 htp/output/SRR17119296_att0/SRR17119296!SRP349148!hg38!local.unique.exon_bw_count.zst -rw------- 1 lgillenwater@x lgillenw 384 2023-08-24 12:59 htp/output/SRR17119296_att0/SRR17119296!SRP349148!hg38!local.unique.exon_fc_count.summary -rw------- 1 lgillenwater@x lgillenw 5814129 2023-08-24 12:59 htp/output/SRR17119296_att0/SRR17119296!SRP349148!hg38!local.unique.exon_fc_count.zst -rw------- 1 lgillenwater@x lgillenw 396 2023-08-24 12:55 htp/output/SRR17119296_att0/SRR17119296!SRP349148!hg38!local.unique.gene_fc_count.summary -rw------- 1 lgillenwater@x lgillenw 3801587 2023-08-24 12:55 htp/output/SRR17119296_att0/SRR17119296!SRP349148!hg38!local.unique.gene_fc_count.zst -rw------- 1 lgillenwater@x lgillenw 562 2023-08-24 13:01 htp/output/SRR17119296_att0/SRR17119296!SRP349148!hg38!local.unmapped.extract_jx.log -rw------- 1 lgillenwater@x lgillenw 14965165 2023-08-24 13:01 htp/output/SRR17119296_att0/SRR17119296!SRP349148!hg38!local.unmapped.fastq.zst -rw------- 1 lgillenwater@x lgillenw 508 2023-08-24 13:01 htp/output/SRR17119296_att0/SRR17119296!SRP349148!hg38!local.unmapped.idxstats -rw------- 1 lgillenwater@x lgillenw 94 2023-08-24 13:01 htp/output/SRR17119296_att0/SRR17119296!SRP349148!hg38!local.unmapped.jx_bed.zst -rw------- 1 lgillenwater@x lgillenw 0 2023-08-24 13:01 htp/output/SRR17119296_att0/SRR17119296!SRP349148!hg38!local.unmapped.sort.log -rw------- 1 lgillenwater@x lgillenw 2231 2023-08-24 13:01 htp/output/SRR17119296_att0/SRR17119296!SRP349148!hg38!local.unmapped_all.bw -rw------- 1 lgillenwater@x lgillenw 0 2023-08-24 13:01 htp/output/SRR17119296_att0/SRR17119296!SRP349148!hg38!local.unmapped_fastq_removal.done -rw------- 1 lgillenwater@x lgillenw 0 2023-08-24 13:01 htp/output/SRR17119296_att0/SRR17119296!SRP349148!hg38!local.unmapped_split3.fastq.zst -rw------- 1 lgillenwater@x lgillenw 2231 2023-08-24 13:01 htp/output/SRR17119296_att0/SRR17119296!SRP349148!hg38!local.unmapped_unique.bw -rw------- 1 lgillenwater@x lgillenw 16043 2023-08-24 13:01 htp/output/SRR17119296_att0/SRR17119296!SRP349148!hg38!local.unmapped~sorted.bam -rw------- 1 lgillenwater@x lgillenw 440 2023-08-24 13:01 htp/output/SRR17119296_att0/SRR17119296!SRP349148!hg38!local.unmapped~sorted.bam.bai -rw------- 1 lgillenwater@x lgillenw 60 2023-08-24 13:17 htp/output/SRR17119296_att0/stats.json -rw------- 1 lgillenwater@x lgillenw 160 2023-08-24 13:17 htp/output/SRR17119296_att0/std.out

ChristopherWilks commented 10 months ago

thanks for the details!

Everything looks fine, except I suggest trying the following:

1) move everything from under /scratch/alpine/lgillenwater@xsede.org/monorail-external/htp/output/ to /scratch/alpine/lgillenwater@xsede.org/monorail-external/htp/output/htp/

2) rerun the unifier 1.1.1 with the 5th argument (INPUT_DIR_HOST) as /scratch/alpine/lgillenwater@xsede.org/monorail-external/htp/output/htp instead of /scratch/alpine/lgillenwater@xsede.org/monorail-external/htp/output

I suspect I may have assumed in one of the initialization steps that the directory hierarchy would always have the study name as a parent directory of the sample name/ID in the pump output (though I'd need to check that to confirm).

lagillenwater commented 10 months ago

thanks again for replying so quickly!

I just did and what you suggested and still got the same output. I tried running in both single_study and multi_study mode.

ChristopherWilks commented 10 months ago

hmm, bummer that didn't work.

ok, if you can send me the exact command line of the pump part of this, that would be helpful. I may need to try to re-create what you're doing on my end (at least for a few samples).

Also, did you run where the pump will download from SRA itself, or did you pre-download manually from SRA and then run locally on the FASTQs (looks like the latter since you're using "local")?

lagillenwater commented 10 months ago

Sure, here's the contents of the script for the job:

!/bin/bash

SBATCH --nodes=1

SBATCH --ntasks=20

SBATCH --time=03:00:00

SBATCH --partition=amilan

SBATCH --output=monorail-%j.out

SBATCH --mail-type=END

SBATCH --mail-user=lucas.gillenwater@cuanschutz.edu

SBATCH --job-name=monorail

load necessary modules

module load sra-toolkit

{

saving the sample metadata

FILENAME="/scratch/alpine/lgillenwater@xsede.org/monorail-external/htp/sample_metadata.tsv"
touch $FILENAME
echo -e 'study_id\tsample_id' >> $FILENAME

read
while IFS=, read -r SRA
do
echo $SRA

# # add metadata to sample metadata
echo -e "SRP349148\t$SRA" >> $FILENAME

# # fetch and process sra files 
prefetch --max-size 200G -L info -t http -O /scratch/alpine/lgillenwater@xsede.org/monorail-external/htp/sra $SRA
fasterq-dump --split-files ./sra/$SRA -O /scratch/alpine/lgillenwater@xsede.org/monorail-external/htp/sra
rm -r /scratch/alpine/lgillenwater@xsede.org/monorail-external/htp/sra/$SRA

# PUMP 
/scratch/alpine/lgillenwater@xsede.org/monorail-external/singularity/run_recount_pump.sh \
            /scratch/alpine/lgillenwater@xsede.org/monorail-external/recount-rs5_1.0.6.sif\
            $SRA \
            local \
            hg38 \
            20\
            /scratch/alpine/lgillenwater@xsede.org/monorail-external \
            /scratch/alpine/lgillenwater@xsede.org/monorail-external/htp/sra/${SRA}_1.fastq \
            /scratch/alpine/lgillenwater@xsede.org/monorail-external/htp/sra/${SRA}_2.fastq \
            SRP349148

done

} < /scratch/alpine/lgillenwater@xsede.org/monorail-external/htp/SraTest.csv

mv /scratch/alpine/lgillenwater@xsede.org/monorail-external/output /scratch/alpine/lgillenwater@xsede.org/monorail-external/htp/

/scratch/alpine/lgillenwater@xsede.org/monorail-external/singularity/run_recount_unify.sh \ /scratch/alpine/lgillenwater@xsede.org/monorail-external/recount-unify_1.1.1.sif \ hg38 \ /scratch/alpine/lgillenwater@xsede.org/monorail-external/references \ /scratch/alpine/lgillenwater@xsede.org/monorail-external/htp/unifier \ /scratch/alpine/lgillenwater@xsede.org/monorail-external/htp/output/htp \ /scratch/alpine/lgillenwater@xsede.org/monorail-external/htp/sample_metadata.tsv \ 20\ htp:110

ChristopherWilks commented 10 months ago

Hi @lagillenwater,

Thanks for the additional details. I realized that even though we used Xsede related resources to generate all of recount3, we didn't use it to run the unifier (only pump).

My informed guess this time is that the @ symbol in the full paths is messing up the step where the unifier is stopping since that step is using Perl for which the @ symbol is special.

So here's another suggestion:

/scratch/alpine/lgillenwater@xsede.org/monorail-external/singularity/run_recount_unify.sh
recount-unify_1.1.1.sif
hg38
references
htp/unifier
htp/output/htp
htp/sample_metadata.tsv
20
htp:110
lagillenwater commented 10 months ago

Hi @ChristopherWilks,

Thanks for the suggestion. I tried the command as you suggested:

cd /scratch/alpine/lgillenwater@xsede.org/monorail-external

/scratch/alpine/lgillenwater@xsede.org/monorail-external/singularity/run_recount_unify.sh \
                            recount-unify_1.1.1.sif \
                            hg38 \
                            references \
                            htp/unifier \ 
                            htp/output/htp \
                            sample_metadata.tsv \
                            20\
                            htp:110

And I got the same output. Here are the last lines:

Running single-study mode

/bin/bash -x /recount-unify/scripts/create_directory_hierarchy_for_one_study.sh /container-mounts/working/ids.input /container-mount\
s/input /container-mounts/working/intermediate_links
/bin/bash -x /recount-unify/scripts/find_done.sh /container-mounts/working/intermediate_links links '*_att'

Any other ideas?

ChristopherWilks commented 10 months ago

hmm, ok, so my suspicion is still that there's something unexpected about the layout of the pump output directory structure, passed through via intermediate_links, which is somehow causing find_done.sh to fail.

Can you ls -ltr the /scratch/alpine/lgillenwater@xsede.org/monorail-external/htp/unifier/intermediate_links directory and send me the output?

I still think there may be an issue with the shared filesystem here. So I suggest copying 2 of the sample directories in the pump output directory onto an entirely different filesystem. That might be more difficult given you're running under a batch scheduler (e.g. SLURM) but I'd suggest you get an interactive session on a node where you can ssh and then manually run the unifier.

I'd then try copying the 2 sample directories either to the local disk of the node or to /dev/shm if the memory on the node is large enough (probably should be at least 64GBs). And then try re-running the unifier with just those 2 samples using the copied directory as the pump input and a local directory on the node as the unifier output directory. Also, you'll need to pare down the sample_metadata.tsv file as well to just those 2 samples).

lagillenwater commented 10 months ago

I don't see the intermediate_links directory. Below are the contents of setup_intermediate_links.run

set -o pipefail -o nounset -o errexit
ids_file=/container-mounts/working/ids.input
input_dir=/container-mounts/input
link_dir=/container-mounts/working/intermediate_links
perl -ne 'BEGIN { $input_dir="/container-mounts/input"; $link_dir="/container-mounts/working/intermediate_links"; } chomp; if(!$study) { $study=$_; next; } $sample_dir=$_; $study=~/(..)$/; $lo1=$1; $sample_dir=~\
/^(.*(..))_att\d+$/; $sample=$1; $lo2=$2; `mkdir -p $link_dir/$lo1/$study/$lo2/$sample`; `ln -fs $input_dir/$sample_dir $link_dir/$lo1/$study/$lo2/$sample/$sample_dir`; `touch $link_dir/$lo1/$study/$lo2/$sample/$s\
ample_dir.done`;'
cat /dev/fd/63 /dev/fd/62
ls /container-mounts/input
cut -f 1 /container-mounts/working/ids.input
head -1
lagillenwater commented 10 months ago

Also, I tried moving the 2 sample directories to /dev/shm and running interactively. Still got the same result.

lagillenwater commented 10 months ago

@ChristopherWilks Any other ideas why this is not working?

ChristopherWilks commented 9 months ago

@lagillenwater, yes I'm running out of ideas, but here's a few more things to try (if you're still working on this):

1) take 2 of the samples from your list of ones you ran through pump (e.g. SRR17119296 and SRR17119297), cut down your sample_metadata.tsv file to just those 2 (plus the header) and then copy all of the pump outputs for them out to an entirely different system (preferably one outside of Xsede) but which still has Singularity installed (local HPC system?). Then try to re-run the Unifier there for just those 2 samples.

2) related to 1), if you have a way of publicly sharing larger files, if you could tar+zip up those 2 samples' pump outputs, including their immediate parent directories, and post them for me to download, I could try to run them myself.

lagillenwater commented 9 months ago

@ChristopherWilks Thanks for the suggestions. I tried running the code on a local HPC and got a new error.

~/monorail-external/htp/unifier ~/monorail-external
PROJECT_SHORT_NAME=htp
PROJECT_ID=110
sending incremental file list

sent 73 bytes  received 12 bytes  170.00 bytes/sec
total size is 41  speedup is 0.48
Working dir: /container-mounts/working
/container-mounts/working ~/monorail-external/htp/unifier
Ref dir: /container-mounts/ref
Input dir: /container-mounts/input
Annotated JXs Path: /container-mounts/ref/annotated_junctions.tsv.gz
Disjoint Exons BED Path (w/ header): /container-mounts/ref/exons.w_header.bed.gz

Any ideas what this could mean? Did I download the correct reference files?

lagillenwater commented 9 months ago

@ChristopherWilks

Here is a link to the pump output files. Let me know when you download and I'll delete them. Thanks.

ChristopherWilks commented 9 months ago

Hi @lagillenwater,

I was able to access the zip file on google drive but the sample directories are empty. I'd suggest something like: tar -cvf SRR17119296_att0.tar SRR17119296_att0; gzip SRR17119296_att0.tar