sdparekh / zUMIs

zUMIs: A fast and flexible pipeline to process RNA sequencing data with UMIs
GNU General Public License v3.0
269 stars 67 forks source link

Rerun from mapping step using filtered.tagged.unmapped.bam? #324

Closed Anacristina0914 closed 2 years ago

Anacristina0914 commented 2 years ago

Hello,

I am a new zUMIs user, trying to analyze scRNA-seq data from Smart-Seq3. Due to an OUT_OF_MEMORY error in the HPC cluster I am using my run failed after the filtering step. Though this error has been sorted out, I would (if possible) like to re-run the pipeline from the step where it was left out. I noticed the YAML file contains a section to indicate zUMIs which step to start from, but I couldn't find any field to indicate which filtered.tagged.unmapped.bam file to use when starting from the mappign step, which makes me think that zUMIs will simply not carry out the filtering and jump to the mapping step instead. Is it possible to use the output from the previous step, or would I need to rerun everything from the beginning? Thank you in advance for your answer.

Best,

Ana C.

cziegenhain commented 2 years ago

Hey,

You can set which_stage: Mapping to skip the Filtering step and resume from the mapping. The file name of the unmapped bam will be automatically inferred by zUMIs based on the project name & output folder set in the yaml file.

Best, C

Anacristina0914 commented 2 years ago

Hello,

Thank you for your super fast response! Great, I just wanted to make sure the content of the output folder wouldn't be deleted if I set it to the same one as my previous run.

One more question: since the mapping step had already started at the moment the run crashed I have a lot of temp files with the format 'filtered.Aligned.GeneTagged.sorted.bam.tmp.00XX.bam'. Since zUMIs appends content into these files, is it advisable to delete them before rerunning, or would new files be created by default?

Best,

Ana C.

cziegenhain commented 2 years ago

No problem. It should overview any existing tmp files like this, but you can also delete them before to be sure - it should not really matter.

Anacristina0914 commented 2 years ago

Perfect! thank you very much!

Anacristina0914 commented 2 years ago

Hello, again!

I thought I would post some additional comments (please feel free to correct me if i'm wrong) in case it is helpful for someone attempting run zumis starting from the mapping step.

I re-ran zumis, and decided not to delete any files. The first time I got an error:

'EXITING because of FATAL INPUT ERROR: --readFilesType SAM requires specifying SE or PE reads SOLUTION: specify --readFilesType SAM SE for single-end reads or --readFilesType SAM PE for paired-end reads'

This was corrected by simply adding an additional line stating 'read_layout: PE' at the end of the yaml file. I submitted the job a second time, and after running for about 39h it crashed, this time the error was the following:

_[1] "Coordinate sorting intermediate bam _file..." [E::hts_openformat] Failed to open file /zUMIs_output/.filtered.Aligned.GeneTagged.sorted.bam.tmp.00XX.bam _samtools sort: failed to create temporary file "//zUMIs_output/.filtered.Aligned.GeneTagged.sorted.bam.tmp.00XX.bam": File exists . . ._

Which makes me believe I should've deleted the tmp files previously generated by zumis before re-running the job. I have now deleted the files (.filtered.Aligned.GeneTagged.sorted.bam.tmp.00XX.bam) and resubmitted the job, hopefully it will work this time!