ksiewert / BetaScan

Genome-wide scan for balancing selection using beta statistic
27 stars 5 forks source link

B2 error: TypeError: a float is required #11

Closed Huangglian closed 3 years ago

Huangglian commented 3 years ago

Hi ksiewert,

I met the question in calculating B2 about 'TypeError: a float is required': Traceback (most recent call last): File "BetaScan.py", line 613, in main() File "BetaScan.py", line 604, in main output.write(str(loc)+"\t"+str(round(B,6))+"\t"+str(round(T,6))+"\n") TypeError: a float is required

and I have try to add the following right before line 604 in the code, print loc print B
print type(loc)
print type(B)

and get the outputs: <type 'int'> <type 'numpy.float64'> 463810 0 <type 'int'> <type 'int'> 464483 0 <type 'int'> <type 'int'> 464542 0 <type 'int'> <type 'int'> 465000 None <type 'int'> <type 'NoneType'> Traceback (most recent call last): File "BetaScan2.py", line 618, in main() File "BetaScan2.py", line 607, in main output.write(str(loc)+"\t"+str(round(B,6))+"\n") #Remove thetas TypeError: a float is required

Also, I found that some B2 value are too high in the later part of output: 340886 0.406094 341043 -0.703357 341255 -1.737664 341366 -1.158691 341514 3997.273195 342070 9092.164337 342131 9775.129148

I would appreciate if you can make any suggestion.

Thanks for your help,

G.L

ksiewert commented 3 years ago

Hi G.L.

Thanks for your message. I'm wondering if you could send me the portion of the input file corresponding to the window around the SNP at 342131, along with the full command you used to run BetaScan.

Could you also replace line 607 with the following

try:
     output.write(str(loc)+"\t"+str(round(B,6))+"\n") 
except:
     print loc,B

and let me know what it says?

Best, Katie

Huangglian commented 3 years ago

Hi Katie,

Thank you so much for your reply. Here is my full command for BetaScan: python2 BetaScan.py -B2 -DivTime 2 -i 27 -o output_27

I followed your advice to replace line 607 with try: output.write(str(loc)+"\t"+str(round(B,6))+"\n") except: print loc,B

And got the error prompt as the follow (and the output file seems no change):

Traceback (most recent call last): File "BetaScan22.py", line 617, in main() Contig27.txt

File "BetaScan22.py", line 602, in main output.write(str(loc)+"\t"+str(round(B,6))+"\n") #Remove thetas TypeError: a float is required

I have tried to intercept 500 snps around 342131 to run betascan, it didn't reported error and the results without abnormally high value. So here I send the full input file to you.

Thanks for your help, again.

Sincerely, G.L

ksiewert commented 3 years ago

Hi G.L.

The Type Error is because your input file is not sorted. The SNP locations need to be unique (for example, you have two SNP at location 2164884) and the file needs to be sorted. If your SNP locations start over because you have a new contig or a chromosome, then BetaScan should be run separately on each segment, or you should redo the SNP locations so that they are continuous.

For the extremely high BetaScores, I think what is going wrong is that there are no singletons (SNPs at frequency 1) in your data. In general there seems to be many fewer low frequency SNPs than there should be. Low frequency SNPs should be much more common than intermediate frequency SNPs. Were low-frequency SNPs removed because they couldn't confidently be called? If so, then I'm afraid that this SNP data is probably not of high enough quality to run BetaScan (and probably not any other high-powered scans for balancing selection either).

It looks like what's happening with the SNPs with a super high Beta score is that their windows contain no low frequency SNPs and no substitutions. This is extremely unlikely due to chance (and almost surely is a technical artifact), so is causing the Beta score to be artificially high.

Let me know if you have any more questions.

Best, Katie

Huangglian commented 3 years ago

Dear Katie,

Thank you so much for your timely help!

Best wishes,

G.L