Closed jeanmidevacc closed 3 years ago
@jeanmidevacc I've looked into it and it looks like you can divide the values that are > 100
by 100
.
For example, if you see confidence = 9657.65
, the actual confidence in a range 0-100
is 96.5765
.
This is obviously an issue in the dataset and I'm adding this fix to the next release that's coming up this week.
Thank you for catching it and for describing the issue the way you did!
Great thanks for the update (and to have handle quickly the issue)
Describe the bug Hello ,
I was looking on the data from the lite dataset this morning and I noticed something weird in the column 'ai_service_2_confidence' from the
keywords.tsv000
file.when I applied some stats on the columns about ai_service the column 'ai_service_2_confidence' seems to have extreme value that are exceeding 100 that is for me the expected max (if I take the
ai_service_1_confidence
as reference for exemple)To Reproduce
There is the code to build the stats
Steps to reproduce the behavior: Having a python environment (3.6.13) with pandas 1.1.5 installed
Expected behavior I am expecting to have a value in the column 'ai_service_2_confidence' in
keywords.tsv000
file between 0 and 100 or if it's not the case having a more precise description of the value for the 'ai_service_2_confidence' in the description (like the range)Additional context I have a list of the keywords that seems to be impacted by these extreme values unsplash_extreme_value.zip
Hope that it will help on your investigation π΅οΈββοΈ (and I hope that is not just me that is missing something)
PS: your dataset is great by the way (really hope to have access to the full version soon)π