katerinakazantseva / strainy

Graph-based assembly phasing
Other
66 stars 5 forks source link

Phased bam error #53

Closed jsgounot closed 1 year ago

jsgounot commented 1 year ago

Hi,

I tried running your soft with a toy dataset composed of simulated reads from 5 e.coli strains. I assembled the reads with flye and followed your instructions on github. I hit this error when I run the soft:

~/soft/stRainy/strainy.py phase -o ./resdir -b assembly_graph.bam -g assembly_graph.gfa -m nano -t 1
[2023-02-23 02:54:43] [Root] INFO:  Creating fasta file from the provided gfa file assembly_graph.gfa
[2023-02-23 02:54:43] [Root] INFO:  Done!
[2023-02-23 02:54:43] [Root] INFO:  Starting phasing
[2023-02-23 02:54:43] [Root] INFO:  CMD: phase -o ./resdir -b assembly_graph.bam -g assembly_graph.gfa -m nano -t 1
[2023-02-23 02:54:43] [Thread 13322] INFO:

    ==== Processing uniting edge_1 ====
[2023-02-23 02:54:43] [Thread 13322] INFO:  ### Reading SNPs...
Failed to read from standard input: unknown file type
[2023-02-23 02:54:43] [Thread 13322] INFO:

    ==== Processing uniting edge_77 ====
[2023-02-23 02:54:43] [Thread 13322] INFO:  ### Reading SNPs...
Failed to read from standard input: unknown file type
[2023-02-23 02:54:43] [Thread 13322] INFO:

    ==== Processing uniting edge_153 ====
[2023-02-23 02:54:43] [Thread 13322] INFO:  ### Reading SNPs...
Failed to read from standard input: unknown file type
[2023-02-23 02:54:43] [Thread 13322] INFO:

    ==== Processing uniting edge_229 ====
[2023-02-23 02:54:43] [Thread 13322] INFO:  ### Reading SNPs...
Failed to read from standard input: unknown file type
[2023-02-23 02:54:43] [Root] ERROR:  Worker thread exception! Command 'bcftools mpileup -r edge_1 assembly_graph.bam --no-reference -I --no-version --annotate FORMAT/AD 2>/dev/null | bcftools query -f  "%CHROM %POS [ %AD %DP]
" >./resdir/vcf/vcf_edge_1.txt' returned non-zero exit status 255.
Exception in thread Thread-3 (_handle_results):
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/strainy/lib/python3.10/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/ubuntu/miniconda3/envs/strainy/lib/python3.10/multiprocessing/pool.py", line 51, in starmapstar
    return list(itertools.starmap(args[0], args[1]))
  File "/home/ubuntu/soft/stRainy/strainy/phase.py", line 26, in _thread_fun
    cluster(i, shared_flye_consensus)
  File "/home/ubuntu/soft/stRainy/strainy/clustering/cluster.py", line 65, in cluster
    SNP_pos = read_snp(StRainyArgs().snp, edge, StRainyArgs().bam, AF)
  File "/home/ubuntu/soft/stRainy/strainy/clustering/build_data.py", line 18, in read_snp
    subprocess.check_output(snpos, shell=True, capture_output=False)
  File "/home/ubuntu/miniconda3/envs/strainy/lib/python3.10/subprocess.py", line 421, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/home/ubuntu/miniconda3/envs/strainy/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'bcftools mpileup -r edge_1 assembly_graph.bam --no-reference -I --no-version --annotate FORMAT/AD 2>/dev/null | bcftools query -f  "%CHROM %POS [ %AD %DP]
" >./resdir/vcf/vcf_edge_1.txt' returned non-zero exit status 255.
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/strainy/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/home/ubuntu/miniconda3/envs/strainy/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/miniconda3/envs/strainy/lib/python3.10/multiprocessing/pool.py", line 595, in _handle_results
    cache[job]._set(i, obj)
  File "/home/ubuntu/miniconda3/envs/strainy/lib/python3.10/multiprocessing/pool.py", line 828, in _set
    self._error_callback(self._value)
  File "/home/ubuntu/soft/stRainy/strainy/phase.py", line 47, in <lambda>
    pool.starmap_async(_thread_fun, init_args, error_callback=lambda e: _error_callback(pool, e))
  File "/home/ubuntu/soft/stRainy/strainy/phase.py", line 33, in _error_callback
    raise e
subprocess.CalledProcessError: Command 'bcftools mpileup -r edge_1 assembly_graph.bam --no-reference -I --no-version --annotate FORMAT/AD 2>/dev/null | bcftools query -f  "%CHROM %POS [ %AD %DP]
" >./resdir/vcf/vcf_edge_1.txt' returned non-zero exit status 255.
[2023-02-23 02:54:43] [Root] INFO:  Total number of key hits and misses for consensus computation:
[2023-02-23 02:54:43] [Root] INFO:   H:0, M:0
[2023-02-23 02:54:43] [Root] INFO:  Creating phased bam
Traceback (most recent call last):
  File "/home/ubuntu/soft/stRainy/strainy.py", line 97, in <module>
    main()
  File "/home/ubuntu/soft/stRainy/strainy.py", line 89, in main
    sys.exit(phase_main(args))
  File "/home/ubuntu/soft/stRainy/strainy/phase.py", line 111, in phase_main
    color_bam(StRainyArgs().edges)
  File "/home/ubuntu/soft/stRainy/strainy/phase.py", line 57, in color_bam
    color(e)
  File "/home/ubuntu/soft/stRainy/strainy/color_bam.py", line 47, in color
    write_bam(edge, I, AF)
  File "/home/ubuntu/soft/stRainy/strainy/color_bam.py", line 16, in write_bam
    infile = pysam.AlignmentFile(StRainyArgs().bam, "rb")
  File "pysam/libcalignmentfile.pyx", line 751, in pysam.libcalignmentfile.AlignmentFile.__cinit__
  File "pysam/libcalignmentfile.pyx", line 1000, in pysam.libcalignmentfile.AlignmentFile._open
ValueError: file has no sequences defined (mode='rb') - is it SAM/BAM format? Consider opening with check_sq=False

Regards, jsgounot

atabeerk commented 1 year ago

Hi jsgounot,

Thanks for using stRainy! Seems like a bcftools issue. I need more information to be able to help.

Ataberk

jsgounot commented 1 year ago

Hi,

Are you able to run the test_set that comes with the repo? Yes.

Which version of bcftools are you using? bcftools 1.16 Using htslib 1.16

If it is possible, could you share the input dataset with us at ataberk [at] umd.edu? I'll try to send it today.

Thanks! JS

atabeerk commented 1 year ago

Hello again,

There seems to be an issue with the .bam file. Assuming the .fastq file you sent contains the reads you want to align to the fasta file, I used the following command to generate a new .bam file (replace map-ont with map-hifi for HiFi reads):

minimap2 -ax map-ont assembly_graph.fasta merged.fastq.gz | samtools sort -@4 -t 8 > new_assembly_graph.bam

Then, using new_assembly_graph.bam as the input, error is gone. Let me know if this works for you too.

Ataberk

jsgounot commented 1 year ago

Hi,

does this mean that stRainy is only available for hifi reads? Mapping ONT reads with Hifi preset does not seem right.

Regards, JS

atabeerk commented 1 year ago

stRainy works with both HiFi and ONT reads. You can use either map-ont or map-hifi with minimap2 to generate a .bam file.

Ataberk

jsgounot commented 1 year ago

Hi, sorry I misread your last comment. This seems to work now, thanks!