igvteam / igv-reports

Python application to generate self-contained pages embedding IGV visualizations, with no dependency on original input files.
MIT License
350 stars 52 forks source link

TypeError: a bytes-like object is required, not 'str' #79

Closed user-tq closed 1 year ago

user-tq commented 1 year ago

i run

python igv-reports/igv_reports/report.py  mafs/variants.maf   /mnt/tool/ref_资源/iGenomes/references/Homo_sapiens/GATK/GRCh37/Sequence/WholeGenomeFasta/human_g1k_v37_decoy.fasta   --ideogram /mnt/tool/ref_资源/iGenomes/references/Homo_sapiens/other/cytoBandIdeo.txt.gz  --flanking 1000 --info-columns Chromosome Start_Position End_Position Variant_Classification Variant_Type Reference_Allele    HGVSc HGVSp HGVSp_Short RefSeq AF  t_depth t_ref_count t_alt_count n_depth n_ref_count n_alt_count gnomAD_AF    --tracks bams/patient101.tumor.bam   /mnt/tool/ref_资源/iGenomes/references/Homo_sapiens/other/refGene.txt.gz     --output example_maf.html
Traceback (most recent call last):
  File "igv-reports/igv_reports/report.py", line 350, in <module>
    main()
  File "igv-reports/igv_reports/report.py", line 346, in main
    create_report(args)
  File "igv-reports/igv_reports/report.py", line 84, in create_report
    reader = utils.getreader(config, None, args)
  File "/home/tanqiang/mambaforge/envs/igvreports/lib/python3.7/site-packages/igv_reports/utils.py", line 13, in getreader
    return bam.BamReader(filetype, path, args)
  File "/home/tanqiang/mambaforge/envs/igvreports/lib/python3.7/site-packages/igv_reports/bam.py", line 13, in __init__
    seqnames = parse_seqnames(header)
  File "/home/tanqiang/mambaforge/envs/igvreports/lib/python3.7/site-packages/igv_reports/bam.py", line 53, in parse_seqnames
    lines = header.split('\n')
TypeError: a bytes-like object is required, not 'str'

https://github.com/pysam-developers/pysam/issues/292#issue-157873086

This seems to be a problem with pysam, but the problem is that when using example bam, it returns str, everything is usual. But when using my own bam, it will return bytes object and cannot use the split method. in igv_reports/bam.py revise lines = str(header).split('\n') Forced conversion can solve the problem, but I'm not sure if it will create new problems.

I created a bam for the minimum reproducible problem,Perhaps there is a certain pattern in its header?

samtools view -b  bams/patient101.tumor.bam  1 > problem.part.bam

problem.part.bam.gz

jrobinso commented 1 year ago

Any idea why this does not affect everyone else? Its a heavily used program, first report of this problem.

The link you reference implies this was fixed in pysam some time ago, perhaps we should update the required pysam version, what version are you using?

Your proposed fix is probably o.k. but I'm curious as to why you are seeing the error in the first place. Could you provide igv-reports command line to reproduce the issue, including a variants file (vcf or otherwise)?

jrobinso commented 1 year ago

The pysam bug you reference is reported against pysam release 0.9.0 (2016). igv-reports requires at lest version 0.19.1 (2022). So before proceeding further please check your pysam version.

user-tq commented 1 year ago

The pysam bug you reference is reported against pysam release 0.9.0 (2016). igv-reports requires at lest version 0.19.1 (2022). So before proceeding further please check your pysam version.

I installed this software using Conda according to the tutorial, and what I can confirm is that my version of Pysam meets the requirements

pip freeze | grep pysam 
pysam==0.21.0

image I think I found the problem because my fasta file was placed in a Chinese path, which caused encoding issues. you can see

/mnt/tool/ref_D<90>/iGenomes/references/Homo_sapiens/GATK/GRCh37/Sequence/WholeGenomeFasta/human_g1k_v37_decoy.fasta

Actually, it's

/mnt/tool/ref_资源/iGenomes/references/Homo_sapiens/GATK/GRCh37/Sequence/WholeGenomeFasta/human_g1k_v37_decoy.fasta

I don't quite understand the underlying reason, and I think the best solution is not to use names that contain non English. Thank you for your reply.