Missing files error in running recount-unify

bingy007 commented 2 years ago

Hi,

I was trying to unify three samples after recount-pump, but there is an error that says Waiting at most 5 seconds for missing files. MissingOutputException in line 318 of /recount-unify/Snakefile: Missing files after 5 seconds: exon_sums_per_study/37/SRP020237/SRP.exon_sums.SRP020237.G026.gz exon_sums_per_study/37/SRP020237/SRP.exon_sums.SRP020237.G029.gz exon_sums_per_study/37/SRP020237/SRP.exon_sums.SRP020237.R109.gz exon_sums_per_study/37/SRP020237/SRP.exon_sums.SRP020237.F006.gz exon_sums_per_study/37/SRP020237/SRP.exon_sums.SRP020237.ERCC.gz exon_sums_per_study/37/SRP020237/SRP.exon_sums.SRP020237.SIRV.gz This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.

I was wondering how the exon_sums_per_study/37/SRP020237/ folder is being made since my project_id is "SRP:237". I checked this folder and it is empty, the other folder "exon_sums_per_study/37/SRP:237" has all the exon_sums files.

My cmd /bin/bash /depot/recount_pipeline/monorail-external/singularity/run_recount_unify.sh /depot/recount_pipeline/recount-unify_1.0.4.sif hg38 /depot/recount_pipeline/ /depot/recount_pipeline/monorail-external/ /depot/recount_pipeline/monorail-external/output /depot/recount_pipeline/monorail-external/SRP237_metadata.tsv 20 SRP:237

The metadata file(tab-separated)

study_id sample_id SRP:237 SRR390726 SRP:237 SRR390727 SRP:237 SRR390728

Thanks for helping! Bingyu

ChristopherWilks commented 2 years ago

Hi @bingy007

I suggest checking your pump run to ensure what study is in the file names, I'm guessing it's probably "SRP020237". The pump and unifier must be run with the same study name. In addition, I can't guarantee that having a colon in your study name is going to work, it's best to use all upper-case alphanumeric characters for study and sample names.

Also, I suggest updating to the latest stable unify container: 1.0.9, 1.0.4 is pretty old.

Separately, in the case where you're using just the test sample (SRR390728), I've updated it recently, since it stopped working after a more recent update. You can re-run pump and unifier and see if that works using these input FASTQs:

http://snaptron.cs.jhu.edu/data/temp/SRR390728_1.fastq.gz http://snaptron.cs.jhu.edu/data/temp/SRR390728_2.fastq.gz

bingy007 commented 2 years ago

Hi @bingy007

I suggest checking your pump run to ensure what study is in the file names, I'm guessing it's probably "SRP020237". The pump and unifier must be run with the same study name. In addition, I can't guarantee that having a colon in your study name is going to work, it's best to use all upper-case alphanumeric characters for study and sample names.

Also, I suggest updating to the latest stable unify container: 1.0.9, 1.0.4 is pretty old.

Separately, in the case where you're using just the test sample (SRR390728), I've updated it recently, since it stopped working after a more recent update. You can re-run pump and unifier and see if that works using these input FASTQs:

http://snaptron.cs.jhu.edu/data/temp/SRR390728_1.fastq.gz http://snaptron.cs.jhu.edu/data/temp/SRR390728_2.fastq.gz

Thanks for your reply! @ChristopherWilks I have tried to adjust the filename but the study_id should be XXX:### and including a ":" in the filename would result in some issue as well(filename truncated from the ":"). Let me try to update the unify and try.

langmead-lab / monorail-external

Missing files error in running recount-unify #13