vaexio / vaex

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
https://vaex.io
MIT License
8.25k stars 590 forks source link

Joining two small dataframes makes my kernel go dead. #2185

Open ashsharma96 opened 2 years ago

ashsharma96 commented 2 years ago

Hey @JovanVeljanoski, Hope you are doing well. Currently I'm working on script where I'm joining two vaex dataframes and while doing that it makes my kernel go dead. The two datasets are spend1 and visit1, their description are like below:

image spend1.csv visit1.csv The two csv's are attached above.

The code I'm using is this:

spend1 = spend1.join(visit1, on="tm_cid", how='right',lsuffix="_",allow_duplication=True)

In this line of code, kernel got dead.

Did you know what is the reason behind this. Both of the dataframes have so much less records.

Regards, Atal Sharma

JovanVeljanoski commented 2 years ago

No idea. Try using how="left" and switch the dataframes around see if that works.

Otherwise this can serve as a nice unit-test @maartenbreddels

ashsharma96 commented 2 years ago

@JovanVeljanoski Thanks for responding.

how='left'

I can not use left because it will only provide me one joined row which I don't want. I need all the records from both of the dataframes. I also tried reversing it like this:

 spend1 = visit1.join(spend1, on="tm_cid", how='left',lsuffix="_",allow_duplication=True)

This also gives kernel died issue. Earlier it was not giving any error when I made this code. After 2 months when I ran this code it starts giving error. Can you please let me know if you find anything on this one.

Regards, Atal Sharma

JovanVeljanoski commented 2 years ago

Hey @ashsharma96

I run your original example, and i can not reproduce any errors:

Screen Shot 2022-08-29 at 08 09 53

Your second example works as well on my end.

Can you provide more information? When starting an issue we ask for some info that you did not provide.

ashsharma96 commented 2 years ago

Hey @JovanVeljanoski, Thanks for responding I don't know how it works on your end. Maybe I converted to csv that makes some thing to work. But now it start giving error very frequenlty in many groupby conditions which didn't give any error before vaex new version. Here's an video which shows what's happening in my side:

https://user-images.githubusercontent.com/17443937/187846527-3d1d4746-709f-404d-a878-5940e460eacf.mp4

Can you please look into this one and tell me where I'm going wrong.

Regards, Atal Sharma

JovanVeljanoski commented 2 years ago

Hey, is this issue about join or something else? If the issue is not about join, please close this an open another thread of whatever issue you are experiencing.

Please make a reproducible example of what is going wrong, something we can run on our end, and if necessary fix.

Sorry, but we simply can not afford time to reverse engineer what is going on in videos / screenshots etc... Code can easily be understood, but other media not so much.

ashsharma96 commented 2 years ago

@JovanVeljanoski I'm not asking you to see my code or find any error. I'm just showing you where vaex join is giving kernel died error because its happening only in my instance and I can't provide you more data. Yes join is giving kernel died error even for small dataframes. I've encountered this error in more than 6 or 7 joins in my different notebook. When I developed these notebooks, these were working fine but after I updated the vaex library. After that I started getting these errors. Its happening only in some joins not on all joins. Ohk I'll to reproduce this error. But please try to look the video once. Its just a one minute video or even try to look the last 30 seconds of video. It will just show you where and how kernel is dying for vaex for simple line of code.

Regards, Atal Sharma

JovanVeljanoski commented 2 years ago

When you open a new issue, there is a template, asking you some questions, that you ignored. Please go back and answer those, it helps us narrow down the problem.

Sorry, i can't afford time to look at videos. If you provide a reproducible example (something that we can copy-paste, get the same error and attack it), we would be happy to try and fix things as needed.

Otherwise you can also make contact via Vaex.io for dedicated help.