sdparekh / zUMIs

zUMIs: A fast and flexible pipeline to process RNA sequencing data with UMIs
GNU General Public License v3.0
275 stars 67 forks source link

samtools failed to open "NA" #237

Closed zakieh-tayyebi closed 3 years ago

zakieh-tayyebi commented 3 years ago

Hi! I am getting a lot of samtools errors in the 'Mapping' stage (please see the run-time log), indicating that the BAM files are NA and do not exist (I have provided correct absolute paths to everything):

[E::hts_open_format] Failed to open file NA
samtools view: failed to open "NA" for reading: No such file or directory

Then, in the counting step, the 'filtered.Aligned.GeneTagged.sorted.bam' file can not be opened (please see the run-time log).

I am using zUMIs 2.9.4f and its Conda environment (samtools 1.9, STAR 2.7.3a, R 3.6.3, pigz 2.3.4, Python 3.6.10, pysam 0.15.4, velocyto 0.17.17), on CentOS Linux 7 (Core), using 32 cores each with 10GB memory.

I have attached the run-time stdout/stderr file (zUMIs.Log), the config file (config.run.yaml), and the script used to run zUMIs (zUMIs.sh).

zUMIs.Log

config.run.yaml.txt

zUMIs.sh.txt

cziegenhain commented 3 years ago

Hi,

Sorry to hear that you are having problems with zUMIs! I have seen the NA verbose before, seems to be some imprecision in chunking of the input which relies on an estimation of the read number by file size. Anyway I never had any downstreams issues so far! I can see that the gene tagging (featureCounts) actually does run after STAR finishes, could you confirm that you have a non-zero looking Aligned.bam file? According to the log, there were 875927253 reads - does that fit with your input?

My suspicion is that the error actually happens during the samtools sort operation. Usually using less memory at that step is safer! zUMIs should be plenty fast with just mem_limit: 100 instead.

Best, Christoph

zakieh-tayyebi commented 3 years ago

Thank you for the prompt response! I will decrease the memory limit.

This is what the 'filtered.tagged.Aligned.out.bam' file looks like:

Screen Shot 2021-01-04 at 13 47 17

Just in case, here's what the entire output looks like:

Screen Shot 2021-01-04 at 13 40 37
cziegenhain commented 3 years ago

Yes that looks fine to me, let's see what happens in the rerun!

zakieh-tayyebi commented 3 years ago

Thank you! This time it worked. My best guess is that, since I'm submitting the job to a computing cluster, the absolute paths (which may or may not include the cluster name) were different in R than in bash.