Closed ashsharma96 closed 1 year ago
Thanks for the report. I will take a look at the first opportunity.. Can you update your post above to include answers to the questions we ask? Like version etc.. those are important to track the issue.
I ran your example over 20 times on the latest version, under linux. I can't reproduce your issue. If you can provide more details, that would be great. Otherwise we can't debug what we can't reproduce..
I am doing this
import vaex
correct = {
"One_Brow": 2341,
"One_Trans_AtRisk": 2385,
"One_Trans_Lost": 219,
"One_Trans_Potential": 228,
"One_Trans_high_AtRisk": 159,
"One_Trans_high_Lost": 1,
"One_Trans_high_Potential": 29,
"Rep_Brow": 1,
"Rep_Trans_AtRisk": 76,
"Rep_Trans_Lost": 67,
"Rep_Trans_breakaway": 116,
"Rep_Trans_high_AtRisk": 2,
"Rep_Trans_high_breakaway": 12,
"Rep_Trans_high_loyal": 521,
"Rep_Trans_loyal": 296
}
for i in range(10):
def remove_duplicates(df, grouping_cols: list):
df["index"] = vaex.vrange(0, df.shape[0])
df_group = df.groupby(grouping_cols, agg=vaex.agg.min("index"))
df = df.join(df_group[["index_min"]], left_on="index", right_on="index_min")
df = df[df.index_min.notna()]
df = df.drop(["index", "index_min"])
df = df.extract()
return df
def calculateNewEngegementClassification(unique):
eng_count = unique['eng_type_new1'].value_counts().to_dict()
return eng_count
df_new = vaex.open('./2301-value-counts-data/error_data.hdf5')
dfInter = remove_duplicates(df_new,['tm_cid'])
res = calculateNewEngegementClassification(dfInter)
print('is it correct:', res == correct)
I tried it both in jupyter (restarting the kernel between tries) and normal python scripts.
@JovanVeljanoski Here is the vaex version I'm using: {'vaex': '4.9.1', 'vaex-core': '4.9.1', 'vaex-viz': '0.5.1', 'vaex-hdf5': '0.12.1', 'vaex-server': '0.8.1', 'vaex-astro': '0.9.1', 'vaex-jupyter': '0.7.0', 'vaex-ml': '0.17.0'}
That is quite a bit behind. Many issues were fixed since then. Please update to the latest version. Also, please answer the questions in the issue template otherwise we can't help.
@JovanVeljanoski Thank you for the quick reply. Sure from next time I'll keep this in mind. Any other details you needed from my side?
Yes, everything that we ask in the template..
Yeah, in 4.12 we fixed an issue in value_counts, see https://github.com/vaexio/vaex/blob/master/CHANGELOG.md#vaex-core-4120
Hey @JovanVeljanoski, Hope you are doing well. While I was working in vaex I found out value counts is not giving proper results even if vaex dataframe has data in it. I observed that sometime the wrong results came in first try and sometime it took 3-4 times executions of same code or function. Sometime it gives Proper Results in first try and it looks like it works fine but when I restart the jupyter kernel and retry again then it gives incorrect Results. Here is the code which I trying:
Correct Results:
Incorrect Results:
*Note : If error doesn't come in first try or second try then try atleast 5-6 times. Because at first it didn't came to my eye too. Data is also attached with this. error_data.zip @JovanVeljanoski @maartenbreddels Can you Please check if there's some issue in your value_counts because pandas is working fine. Regards,