parklab / bamsnap

MIT License
110 stars 24 forks source link

Works with your test data but not with mine #35

Open jdenavascues opened 8 months ago

jdenavascues commented 8 months ago

Hi, Your software seems pretty straightforward to use, but somehow I cannot make it work. I have installed it and I can make it work on the test data from your repository. For instance, I can clone your repository, do cd bamsnap and from there run the command:

bamsnap -bam ./tests/data/test_SV1_softclipped_1.bam \
    -title "Clipped read" \
    -pos chr1:37775740 \
    -out ./out/test_SV1-7.png \
    -bamplot coverage read \
    -margin 100 \
    -no_target_line \
    -show_soft_clipped \
    -read_color_by interchrom \
    -save_image_only

And I get this: test_SV1-7

However, when I try to do this with my own BAM file (a Drosophila RNAseq data replicate):

bamsnap -bam reads_indexed_and_sorted.bam \
    -title "my_analysis" \
    -pos 3L:747400-755492 \
    -draw bamplot \
    -bamplot read \
    -out ./my_plot.png \
    -read_group strand

I get the following errors:

Process proc 1:
Traceback (most recent call last):
  File "/Users/username/opt/anaconda3/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap self.run()
  File "/.../.../multiprocessing/process.py", line 99, in run self._target(*self._args, **self._kwargs)
  File "/.../.../site-packages/bamsnap/bamsnap.py", line 233, in run_process_drawplot_bamlist refseq = rseq.get_refseq(pos1)
  File "/.../.../site-packages/bamsnap/bamsnap.py", line 541, in get_refseq refseq = self.get_refseq_from_ucsc(pos1)
  File "/.../.../site-packages/bamsnap/bamsnap.py", line 576, in get_refseq_from_ucsc if line[0] != '<':
IndexError: string index out of range
2024-02-02 16:20:03,200 : [INFO] Total running time: 0.4 sec

I am not sure what to make of this, as the example above did not need a reference sequence, but just in case, I tried again providing the ref file, adding to the previous command -ref Drosophila_melanogaster.BDGP6.46.dna.toplevel.fa. The FASTA file is from Ensemble (and the BAM files were produce by mapping the fastq files to the same version of the genome, and then indexing and sorting the BAM files with this reference) But now I get this error:

Process proc 1:
Traceback (most recent call last):
  File "/.../.../multiprocessing/process.py", line 297, in _bootstrap self.run()
  File "/.../.../multiprocessing/process.py", line 99, in run self._target(*self._args, **self._kwargs)
  File "/.../.../site-packages/bamsnap/bamsnap.py", line 235, in run_process_drawplot_bamlist
    imagefname = bsplot.drawplot_bamlist(pos1, image_w, bamlist, xscale, refseq)
  File "/.../.../site-packages/bamsnap/bamsnap.py", line 473, in drawplot_bamlist
    ia_sub = self.get_bamplot_image(bam, pos1, image_w, xscale, refseq)
  File "/.../.../site-packages/bamsnap/bamsnap.py", line 357, in get_bamplot_image rset.calculate_readmap(is_strand_group=True)
  File "/.../.../site-packages/bamsnap/drawreadset.py", line 305, in calculate_readmap r = DrawRead(a, self.refseq)
  File "/.../.../site-packages/bamsnap/drawread.py", line 66, in __init__ self.set_read_variant()
  File "/.../.../site-packages/bamsnap/drawread.py", line 126, in set_read_variant refbase = self.refseq[gp].upper()
KeyError: 448823
2024-02-02 16:33:43,856 : [INFO] Total running time: 0.1 sec

I am not sure how to interpret this error message. There seems to be a problem with the reference sequence but I do not know what can that be. I have tried with the FASTA file of the individual 3L chromosome arm, but that does not help.

Is there any reason why my commands would not work? Thanks so much in advance.