sandberg-lab / Smart-seq3

Code and analysis pipeline for Smart-seq3 (Hagemann-Jensen et al. 2020).
GNU General Public License v3.0
50 stars 12 forks source link

Error while running ss3_isoform.py #5

Open kwglam opened 3 years ago

kwglam commented 3 years ago

Hi Angela,

I tried to do the isoform reconstruction by running your ss3_isoform.py script. However, the program halted with the following error messages. Would you please kindly advise what the potential problem is? Thanks!!

Preprocessing on input BAM ... [bam_sort_core] merging from 104 files and 8 in-memory blocks... [main_samview] fail to read the header from "/home/xxx/projects/Smart-seq3/ss3iso_210629/hsa/ss3iso_210629/preprocess/210624_Smartseq3.filtered.Aligned.GeneTagged.UBcorrected.sorted.bam". [main_samview] fail to read the header from "/home/xxx/projects/Smart-seq3/ss3iso_210629/hsa/ss3iso_210629/preprocess/210624_Smartseq3.filtered.Aligned.GeneTagged.UBcorrected.sorted.bam". [main_samview] fail to read the header from "-". samtools index: "/home/xxx/projects/Smart-seq3/ss3iso_210629/hsa/ss3iso_210629/preprocess/210624_Smartseq3.filtered.Aligned.GeneTagged.UBcorrected.sorted.bam" is in a format that cannot be usefully indexed samtools index: "/home/xxx/projects/Smart-seq3/ss3iso_210629/hsa/ss3iso_210629/preprocess/210624_Smartseq3.filtered.Aligned.GeneTagged.UBcorrected.sorted.bam" is in a format that cannot be usefully indexed Collect informative reads per gene... samtools index: "/home/xxx/projects/Smart-seq3/ss3iso_210629/hsa/ss3iso_210629/expression_ensembl/ex_210624_Smartseq3.filtered.Aligned.GeneTagged.UBcorrected.sorted.bam" is in a format that cannot be usefully indexed samtools index: "/home/xxx/projects/Smart-seq3/ss3iso_210629/hsa/ss3iso_210629/expression_ensembl/ex_210624_Smartseq3.filtered.Aligned.GeneTagged.UBcorrected.sorted.bam" is in a format that cannot be usefully indexed ...for genes on 1 multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/home/xxx/anaconda3/lib/python3.8/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, *kwds)) File "/home/xxx/anaconda3/lib/python3.8/multiprocessing/pool.py", line 48, in mapstar return list(map(args)) File "/home/xxx/projects/Smart-seq3/ss3iso/Smart-seq3/ss3iso/pyModule/informative_reads.py", line 479, in _get_reads report_gene = gobj.get_aligned_reads(n_read_limit, passed_cells) File "/home/xxx/projects/Smart-seq3/ss3iso/Smart-seq3/ss3iso/pyModule/informative_reads.py", line 84, in get_aligned_reads samfile = pysam.AlignmentFile(self.in_bam_uniq, "rc") File "pysam/libcalignmentfile.pyx", line 742, in pysam.libcalignmentfile.AlignmentFile.cinit File "pysam/libcalignmentfile.pyx", line 947, in pysam.libcalignmentfile.AlignmentFile._open ValueError: file does not contain alignment data """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/xxx/projects/Smart-seq3/ss3iso/Smart-seq3/ss3iso/ss3_isoform.py", line 109, in main() File "/home/xxx/projects/Smart-seq3/ss3iso/Smart-seq3/ss3iso/ss3_isoform.py", line 99, in main fetch_gene_reads(in_bam_uniq, in_bam_multi, conf_data, op.species, out_path) File "/home/xxx/projects/Smart-seq3/ss3iso/Smart-seq3/ss3iso/pyModule/informative_reads.py", line 550, in fetch_gene_reads report_genes = pool.map(func, genes, chunksize=1) File "/home/xxx/anaconda3/lib/python3.8/multiprocessing/pool.py", line 364, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/home/xxx/anaconda3/lib/python3.8/multiprocessing/pool.py", line 771, in get raise self._value ValueError: file does not contain alignment data

PingChen-Angela commented 3 years ago

Hi, it looks like something is wrong in BAM files.

kwglam commented 3 years ago

Hi Angela, Thanks for the comment. After running zUMIs, 4 bam files are generated. The bam file ending with ".......filtered.Aligned.GeneTagged.UBcorrected.sorted.bam" is the only one comes together with .bai file. Is it the correct bam file for running ss3_isoform.py? I have successfully used this file to run stitcher.py, generating a sam file with stitched RNA molecules. Do you know if there is any way to check what problem the bam file has? Thanks!

PingChen-Angela commented 3 years ago

@kwglam Hi, is this issue solved?

kwglam commented 3 years ago

Yes, this issue has been solved. Thanks!

HaniJieunKim commented 3 years ago

Hi @PingChen-Angela! Thanks for maintaining such a useful package!

Just following on from this thread regarding the inputs of ss3_isoform.py.. I have run zUMI and would now like to run the isoform matching.

Would filtered.tagged.Aligned.out.bam from running zUMI be the correct output into -i [path/to/inputBAM]? I noticed in the above thread the following bam may be required filtered.Aligned.GeneTagged.UBcorrected.sorted.bam, which I think is the bam output from running zUMI with velocyte run.

Thanks in advance for clarifying.

Best regards, Hani

cziegenhain commented 3 years ago

Hi Hani,

The *.filtered.tagged.Aligned.out.bam lacks gene assignment and UMI error correction, which are both needed for isoform inference. The velocyto output from zUMIs has nothing to do with this and is labelled *.tagged.forVelocyto.bam Hence, please use the *.filtered.Aligned.GeneTagged.UBcorrected.sorted.bam file.

Best, Christoph

HaniJieunKim commented 3 years ago

I see, thanks Christoph for the clarification!

xucaoling commented 2 years ago

Hi cziegenhain, When i use .filtered.Aligned.GeneTagged.UBcorrected.sorted.bam for ss3_isoform.py,i have an error: Preprocessing on input BAM ... [bam_sort_core] merging from 88 files and 8 in-memory blocks... Collect informative reads per gene... ...for genes on chr1 multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/home/data/vip55/miniconda3/envs/zUMIs-env/lib/python3.6/multiprocessing/pool.py", line 119, in worker result = (True, func(args, *kwds)) File "/home/data/vip55/miniconda3/envs/zUMIs-env/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar return list(map(args)) File "/home/data/vip55/software/Smart-seq3-master/ss3iso/pyModule/informative_reads.py", line 479, in _get_reads report_gene = gobj.get_aligned_reads(n_read_limit, passed_cells) File "/home/data/vip55/software/Smart-seq3-master/ss3iso/pyModule/informative_reads.py", line 84, in get_aligned_reads samfile = pysam.AlignmentFile(self.in_bam_uniq, "rc") File "pysam/libcalignmentfile.pyx", line 741, in pysam.libcalignmentfile.AlignmentFile.cinit File "pysam/libcalignmentfile.pyx", line 990, in pysam.libcalignmentfile.AlignmentFile._open ValueError: file has no sequences defined (mode='rc') - is it SAM/BAM format? Consider opening with check_sq=False """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/data/vip55/software/Smart-seq3-master/ss3iso/ss3_isoform.py", line 109, in main() File "/home/data/vip55/software/Smart-seq3-master/ss3iso/ss3_isoform.py", line 99, in main fetch_gene_reads(in_bam_uniq, in_bam_multi, conf_data, op.species, out_path) File "/home/data/vip55/software/Smart-seq3-master/ss3iso/pyModule/informative_reads.py", line 550, in fetch_gene_reads report_genes = pool.map(func, genes, chunksize=1) File "/home/data/vip55/miniconda3/envs/zUMIs-env/lib/python3.6/multiprocessing/pool.py", line 266, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/home/data/vip55/miniconda3/envs/zUMIs-env/lib/python3.6/multiprocessing/pool.py", line 644, in get raise self._value ValueError: file has no sequences defined (mode='rc') - is it SAM/BAM format? Consider opening with check_sq=False

and my code is: $python /home/data/vip55/software/Smart-seq3-master/ss3iso/ss3_isoform.py -i smartseq3_mouse_fibroblast.filtered.Aligned.GeneTagged.UBcorrected.sorted.bam -e smartseq3_mouse_fibroblast -o ss3 -p 8 -s mm10 -P -R -c ss3_isoform.conf

so, what's wrong?

Best, Anna

xucaoling commented 2 years ago

Yes, this issue has been solved. Thanks!

Hi Kwglam, How did you solve it?

Shinichiro03 commented 2 years ago

Hi xucaoling,

I also have the same issue. Do you solve the issue?

Best, Shin

lamyankin commented 2 years ago

Hi xucaoling,

I also have the same issue. Do you solve the issue?

Best, Shin

Hi shinichiro03, have you solved the issue?

kwglam commented 2 years ago

@lamyankin, @xucaoling, and @Shinichiro03, I forgot what exactly the problems were coz I have not used it for quite a long time. My recollection is that you have to stick with the old version of bedtools (bedtoolsv.2.26 or older versions) and that you have to change umi_file_prefix = 'UBfix.sort.bam' into umi_file_prefix = 'UBcorrected.sorted.bam' on line 67 in the ss3_isoform.py script. Hope it works....

lokeshbio commented 1 year ago

After fixing the umi_file_prefix = 'UBcorrected.sorted.bam' problem, I get the following error! Does this look familiar? I couldn't quite figure out what the problem is!

multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/opt/conda/lib/python3.9/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/project/ss3iso/pyModule/informative_reads.py", line 468, in _get_reads
    gobj.get_exon_coordinates(gene)
  File "/project/ss3iso/pyModule/informative_reads.py", line 64, in get_exon_coordinates
    gene_id = fds[-1].split(';')[0].split('=')[1]
IndexError: list index out of range
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/project/ss3iso/ss3_isoform.py", line 112, in <module>
    main()
  File "/project/ss3iso/ss3_isoform.py", line 102, in main
    fetch_gene_reads(in_bam_uniq, in_bam_multi, conf_data, op.species, out_path)
  File "/project/ss3iso/pyModule/informative_reads.py", line 550, in fetch_gene_reads
    report_genes = pool.map(func, genes, chunksize=1)
  File "/opt/conda/lib/python3.9/multiprocessing/pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/opt/conda/lib/python3.9/multiprocessing/pool.py", line 771, in get
    raise self._value
IndexError: list index out of range