weld-project / weld

High-performance runtime for data analytics applications
https://www.weld.rs
BSD 3-Clause "New" or "Revised" License
2.99k stars 258 forks source link

KeyError: '|S36' while doing Pivot in weld #462

Open shashwatwork opened 5 years ago

shashwatwork commented 5 years ago

Hi, I'm trying to do pivot on top of one dataframe in normal pandas as well as in weld to checking its performance.

In pandas its working fine, but in weld I got error like KeyError: '|S36' but I don't have that value in dataframe.

Please help me to address this issue

In pandas: pivot_normal

In weld pivot_weld

sppalkia commented 5 years ago

This looks like an issue in the data converter from Pandas to Weld. S36 is a fixed size 36-byte string, you may be able to fix this by setting the dtype of that column to str (a variable length string, for which there is a converter) instead.

On Mon, Jul 15, 2019 at 11:09 PM SHASHWAT TIWARI notifications@github.com wrote:

Hi, I'm trying to do pivot on top of one dataframe in normal pandas as well as in weld to checking its performance.

In pandas its working fine, but in weld I got error like KeyError: '|S36' but I don't have that value in dataframe.

Please help me to address this issue

In pandas: [image: pivot_normal] https://user-images.githubusercontent.com/22785727/61270099-3ff29c80-a7be-11e9-902a-1674519fc041.PNG

In weld [image: pivot_weld] https://user-images.githubusercontent.com/22785727/61270119-4d0f8b80-a7be-11e9-9581-ce5a6d3f98c7.PNG

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/weld-project/weld/issues/462?email_source=notifications&email_token=AAKMEYYY6VK2EBVMTRP42ZTP7VQZTA5CNFSM4ID5MKAKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G7MG4HA, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKMEY2FLFCSE75GZPVD5BLP7VQZTANCNFSM4ID5MKAA .

-- Shoumik

shashwatwork commented 5 years ago

Hi, I had convert all columns to str and also tried pivot. again I got same error KeyError36.

snippet

_pivot_data['person_id'] = pivot_data.person_id.apply(str) pivot_data['code'] = pivot_data.code.apply(str) pivot_data['value'] = pivotdata.value.apply(str)

Might be I'm doing wrong, Please let me know how to move forward to overcome this issue.

Thanks.