velocyto-team / velocyto.py

RNA velocity estimation in Python
http://velocyto.org/velocyto.py/
BSD 2-Clause "Simplified" License
159 stars 82 forks source link

Velocity encounter "ValueError: file has no sequences defined (mode='r') - is it SAM/BAM format?" #195

Open gundalav opened 5 years ago

gundalav commented 5 years ago

I have the following BAM file, repeat masker GTF and genome GTF. They are downloadable here:

However when I tried running with this command:


export LC_ALL=C.UTF-8
export LANG=C.UTF-8
INBAM=data/input.sorted.bam
REPEAT_GTF=data/mm10.repeatmasker.v2
GENOME_GTF=data/mm10.gtf
OUTPATH=data/velocyto_output
velocyto run  --verbose -o $OUTPUT_PATH -m $REPEAT_GTF $INBAM $GENOME_GTF

I got the following error:

2019-05-21 16:00:20,810 - WARNING - Several input files but --onefilepercell is False. Each bam file will be interpreted as containing a SET of cells!!!
2019-05-21 16:00:20,810 - WARNING - When using mutliple files you may want to use --sampleid option to specify the name of the output file
2019-05-21 16:00:20,810 - INFO - No SAMPLEID specified, the sample will be called multi_input_mm10_input_and_others_MTHV9 (last 5 digits are a random-id to avoid overwriting some other file by mistake)
2019-05-21 16:00:20,810 - DEBUG - Using logic: Default
2019-05-21 16:00:20,811 - DEBUG - Cell barcodes will be determined while reading the .bam file
2019-05-21 16:00:20,817 - DEBUG - Peeking into data/mm10.repeatmasker.v2
Traceback (most recent call last):
  File "/home/ubuntu/anaconda2/envs/py36/bin/velocyto", line 11, in <module>
    load_entry_point('velocyto', 'console_scripts', 'velocyto')()
  File "/home/ubuntu/anaconda2/envs/py36/lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/ubuntu/anaconda2/envs/py36/lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/ubuntu/anaconda2/envs/py36/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/ubuntu/anaconda2/envs/py36/lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/ubuntu/anaconda2/envs/py36/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/home/ubuntu/Tools/velocyto.py/velocyto/commands/run.py", line 116, in run
    samtools_memory=samtools_memory, dump=dump, loom_numeric_dtype=dtype, verbose=verbose, additional_ca=additional_ca)
  File "/home/ubuntu/Tools/velocyto.py/velocyto/commands/_run.py", line 159, in _run
    exincounter.peek(bamfile[0])
  File "/home/ubuntu/Tools/velocyto.py/velocyto/counter.py", line 135, in peek
    fin = pysam.AlignmentFile(bamfile)  # type: pysam.AlignmentFile
  File "pysam/libcalignmentfile.pyx", line 736, in pysam.libcalignmentfile.AlignmentFile.__cinit__
  File "pysam/libcalignmentfile.pyx", line 985, in pysam.libcalignmentfile.AlignmentFile._open
ValueError: file has no sequences defined (mode='r') - is it SAM/BAM format? Consider opening with check_sq=False

How can I resolve this problem?

BioInfoUCI commented 4 years ago

Hello,

I have the same issue.

Any updates on how to fix this issue?

Thanks!

hmbaghdassarian commented 4 years ago

I ran into some other issues later so I don't know whether this is the correct answer, but it did get me past this specific error. The error is caused by counter.py script. Try changing this following line (which shows up 3x in that script):

fin = pysam.AlignmentFile(bamfile)

to:

fin = pysam.AlignmentFile(bamfile, check_sq = False)