ylab-hi / ScanExitron

A computational workflow for exitron splicing identification
MIT License
13 stars 6 forks source link

ValueError: too many values to unpack (expected 4) at 'Calculate PSO and PSI' step. #2

Closed tejas-j closed 2 years ago

tejas-j commented 3 years ago

Hi,

I have been trying to run the tool on a BAM file which is aligned by STAR to b37 and has duplicates marked. When I run the tool, I get the following error. Is there a way to debug this error?

Thank you

root@7457481cccc0:/exitron# python /usr/local/bin/ScanExitron/ScanExitron.py -i EA929033.subset.bam -r hg19 
Checking for 'regtools': found /usr/local/bin/regtools

Checking for 'bedtools': found /opt/conda/envs/py36/bin/bedtools

Checking for 'samtools': found /opt/conda/envs/py36/bin/samtools

regtools junctions extract -i 5 -I 10000000 EA929033.subset.bam -o EA929033.subset.bed
Calling junctions start
Calling junctions finished!
regtools junctions annotate /exitron/EA929033.subset.bed /resources/hg19.fa /resources/gencode.v19.annotation.gtf -o EA929033.subset.janno
EA929033.subset.janno generated!
Reading EA929033.subset.janno
bedtools intersect -s -wo -a 11993.junction.bed -b /resources/gencode.hg19.CDS.bed > 11993.overlap.bed
Junctions intersect with CDS
Junctions intersect with CDS finished!
Reading BAM file: EA929033.subset.bam
samtools bedcov EA929033.subset.position.bed EA929033.subset.bam -Q 0
Calculate PSO and PSI.
Traceback (most recent call last):
  File "/usr/local/bin/ScanExitron/ScanExitron.py", line 364, in <module>
    main()
  File "/usr/local/bin/ScanExitron/ScanExitron.py", line 358, in main
    percent_spliced_out(bam_file=args.input, src_exitron_file=src_exitron_file, position_bed_file=position_bed_file, mapq=args.mapq)
  File "/usr/local/bin/ScanExitron/ScanExitron.py", line 277, in percent_spliced_out
    chrm, _, pos, depth = line.rstrip().split()
ValueError: too many values to unpack (expected 4)

EDIT: After some debugging, looks like I have isolated the issue. The command samtools bedcov EA929033.subset.position.bed EA929033.subset.bam -Q 0 is failing because the position.bed file has chromosomes listed as 'chr1', 'chr2' ... and the BAM file has them as '1','2', ...

Is there some part of the code that I can modify so that the chromosomes listed in the BAM file match the BED file?

Than you

dolittle007 commented 3 years ago

Thank you for your suggestions. The current version is designed for hg19/hg38 BAM files and corresponding annotation (genecode). If you input b37/b38 BAM, besides codes, the corresponding annotation files should be changed. Can you realign your reads to hg19/hg38 reference genome? then rerun exitron identification? If you don't have the raw reads, you can extract them using Picard and realign using HISAT2, which is much faster than STAR.

I will modify the code and the annotations to make them compatible with b37 coordinates in the future.