sdparekh / zUMIs

zUMIs: A fast and flexible pipeline to process RNA sequencing data with UMIs
GNU General Public License v3.0
274 stars 67 forks source link

Expected bam outputs after running zUMIs? #275

Closed kwglam closed 3 years ago

kwglam commented 3 years ago

Hi,

I would like to know what expected bam outputs are after a successful run of zUMIs. I have got 4 bam files ending with: ".filtered.tagged.Aligned.out.bam", ".filtered.Aligned.GeneTagged.UBcorrected.sorted.bam", ".filtered.tagged.Aligned.toTranscriptome.out.bam", and ".filtered.tagged.unmapped.bam" after running zUMIs. I believe it was a successful run and I have got all other plots in the zUMIs_output folder (Please also see the attached yaml and nohup).

The reason I asked is that I received a complaint of failing to read the header from my ".filtered.Aligned.GeneTagged.UBcorrected.sorted.bam" file when I used ss3iso.py to run the downstream analysis of my Smart-seq3 data. I am wondering if it is the correct bam file as an input or if there is anything wrong with my bam file.

Thanks in advance for any suggestions and advice!!

210806_Smart-seq3_zUMIs.txt nohup.txt

cziegenhain commented 3 years ago

Hi,

Sounds right! *.filtered.Aligned.GeneTagged.UBcorrected.sorted.bam is the final file you'd mostly use (filtered, STAR mapped, coordinate sorted BAM file with BC and corrected UB tags & featureCount Gene assignments (exon or exon+intron))

You should check if the file is of reasonable size (eg. not just a few kb) and you can also check if it is intact with samtools quickcheck

Best, Christoph

kwglam commented 3 years ago

Hi Christoph,

Thanks very much for the quick response. I have checked the bam file with samtools quickcheck and nothing popped out. The size of the bam file is 29G. So, I believe the bam file itself is normal. Also, I have used this bam file to run stitcher.py and it did give me a sam file with all stitched RNA molecules. I guess it is only the ss3iso.py gives me the error flags.

Error messages from ss3iso.py: [bam_sort_core] merging from 136 files and 8 in-memory blocks... [main_samview] fail to read the header from "/home/xxx/projects/Smart-seq3/ss3iso_210806/hsa/zUMIs-2.9.4_210806/preprocess/210624_Smartseq3.filtered.Aligned.GeneTagged.UBcorrected.sorted.bam". [main_samview] fail to read the header from "/home/xxx/projects/Smart-seq3/ss3iso_210806/hsa/zUMIs-2.9.4_210806/preprocess/210624_Smartseq3.filtered.Aligned.GeneTagged.UBcorrected.sorted.bam". [main_samview] fail to read the header from "-". samtools index: "/home/xxx/projects/Smart-seq3/ss3iso_210806/hsa/zUMIs-2.9.4_210806/preprocess/210624_Smartseq3.filtered.Aligned.GeneTagged.UBcorrected.sorted.bam" is in a format that cannot be usefully indexed samtools index: "/home/xxx/projects/Smart-seq3/ss3iso_210806/hsa/zUMIs-2.9.4_210806/preprocess/210624_Smartseq3.filtered.Aligned.GeneTagged.UBcorrected.sorted.bam" is in a format that cannot be usefully indexed Preprocessing on input BAM ... Collect informative reads per gene... samtools index: "/home/xxx/projects/Smart-seq3/ss3iso_210806/hsa/zUMIs-2.9.4_210806/expression_ensembl/ex_210624_Smartseq3.filtered.Aligned.GeneTagged.UBcorrected.sorted.bam" is in a format that cannot be usefully indexed samtools index: "/home/xxx/projects/Smart-seq3/ss3iso_210806/hsa/zUMIs-2.9.4_210806/expression_ensembl/ex_210624_Smartseq3.filtered.Aligned.GeneTagged.UBcorrected.sorted.bam" is in a format that cannot be usefully indexed ...for genes on chr1 multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/home/xxx/anaconda3/lib/python3.8/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, *kwds)) File "/home/xxx/anaconda3/lib/python3.8/multiprocessing/pool.py", line 48, in mapstar return list(map(args)) File "/home/xxx/projects/Smart-seq3/ss3iso/Smart-seq3/ss3iso/pyModule/informative_reads.py", line 479, in _get_reads report_gene = gobj.get_aligned_reads(n_read_limit, passed_cells) File "/home/xxx/projects/Smart-seq3/ss3iso/Smart-seq3/ss3iso/pyModule/informative_reads.py", line 84, in get_aligned_reads samfile = pysam.AlignmentFile(self.in_bam_uniq, "rc") File "pysam/libcalignmentfile.pyx", line 742, in pysam.libcalignmentfile.AlignmentFile.cinit File "pysam/libcalignmentfile.pyx", line 947, in pysam.libcalignmentfile.AlignmentFile._open ValueError: file does not contain alignment data """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/xxx/projects/Smart-seq3/ss3iso/Smart-seq3/ss3iso/ss3_isoform.py", line 109, in main() File "/home/xxx/projects/Smart-seq3/ss3iso/Smart-seq3/ss3iso/ss3_isoform.py", line 99, in main fetch_gene_reads(in_bam_uniq, in_bam_multi, conf_data, op.species, out_path) File "/home/xxx/projects/Smart-seq3/ss3iso/Smart-seq3/ss3iso/pyModule/informative_reads.py", line 550, in fetch_gene_reads report_genes = pool.map(func, genes, chunksize=1) File "/home/xxx/anaconda3/lib/python3.8/multiprocessing/pool.py", line 364, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/home/xxx/anaconda3/lib/python3.8/multiprocessing/pool.py", line 771, in get raise self._value ValueError: file does not contain alignment data

I know you are not the author of the ss3iso.py script, but any insights or suggestions are appreciated! Thanks!

Cheers, Gabriel

cziegenhain commented 3 years ago

Agree, it must be an issue in the ss3iso.py script. I'm personally not familiar with it, just taking a quick glance at the code the part where your error log breaks is in the few lines of "preprocessing" that it attempts to do: https://github.com/sandberg-lab/Smart-seq3/blob/master/ss3iso/ss3_isoform.py#L73

Since your input bam file is already coordinate sorted, that seems superfluous? Maybe you want to move this discussion to the ss3iso github, but reading the code it seems you should be getting a "preprocess" folder with three files, UBfix.coordinateSorted.bam, UBfix.coordinateSorted_unique.bam and UBfix.coordinateSorted_multi.bam.

Best, Christoph

kwglam commented 3 years ago

I have actually started a thread in the ss3iso github. While I am waiting for the author's response, I also look for solutions from other experts.

There is only one empty bam file, which is my input bam but with no content, in the 'preprocess' folder. I guess the program just terminated without generating any of the 3 bam files after failing to read my input bam.

Thanks, Gabriel

cziegenhain commented 3 years ago

I understand! Feel free to reopen if you need any further assistance with zUMIs. Best, Christoph

lamyankin commented 1 year ago

Hi, kwglam, I met the same eror messages as you. i saw you have solved the problem and closed the issue in the ss3iso github, could you tell me the solution about this issue. I am looking forward to your reply. Lam