tgen / CovGen

Creates a target specific exome_full192.coverage.txt file required by MutSig
MIT License
21 stars 9 forks source link

getALTsForTargetsSeqsForMutSig.py error #4

Closed peterpdu closed 6 years ago

peterpdu commented 6 years ago

Hi,

I'm trying to use it to generate the coverage table for my WES data. I'm using a tab-delimited .bed file but I'm getting the following error:

***** WARNING: File covered.mm10.3col_GRCm38.92/covered.mm10.3col_GRCm38.92_step1.bed has inconsistent naming convention for record:
chr1    3215914 3217074

Traceback (most recent call last):
  File "/broad/tide/methods/frameworks/CovGen/getALTsForTargetsSeqsForMutSig.py", line 352, in <module>
    lseqs, lseqs_coord = parseBedFile(bedf, dict_fasta, naturalSortBedLoci)
  File "/broad/tide/methods/frameworks/CovGen/getALTsForTargetsSeqsForMutSig.py", line 178, in parseBedFile
    logger.debug(id + " : " + str(len(dict_fasta[id])))
UnboundLocalError: local variable 'id' referenced before assignment

The listed entry generating the WARNING is the first entry in my .bed file, so it seems like getALTsForTargetsSeqsForMutSig.py is having trouble parsing my .bed file. The rest of the workflow does not run because the step4 .vcf files are not generated. I'm currently trying to debug this, but I thought it might be worth it to ask if you could help me with this issue.

Thanks!

awchrist commented 6 years ago

Hello,

I believe this is caused by the use of "chr" at the beginning of each chromosome. It also looks like this is a capture for Mus musculus. If this is the case, there may be more trouble shooting that will be required as Mus musculus was not included in our use cases.

peterpdu commented 6 years ago

Thanks for the heads up about CovGen not being tested for mouse data.

I've made some headway in debugging. For some reason "chr" is giving me issues when I run it on the linux server, but when I run the python script alone locally on my Mac, it's not giving me a problem. The program is running into trouble calling process_seq_for_alts because my FASTA is the UCSC version with lowercase bases. Converted lseqs in parseBedFile to upper case and the python script runs to completion. I'll try running the complete program tomorrow and update you.

peterpdu commented 6 years ago

The problem seems to have been that my reference FASTA file was "chr" labeled while the .bed files generated by CovGen are not. I added a sed command to add "chr" to the step2b.bed file and now the program runs fine.