Open arpan-das-astrophysics opened 2 years ago
Seperating the real and imaginary part is what I am doing now but this is a huge dataset and it is taking a lot of time.
As we don't support a complex
type in cuDF, unfortunately that's the best approach I can think of as well. I understand that things are further complicated because what you want is a List[complex]
data type.
Can you share an example of the kind of operations you wish to perform? Perhaps we can suggest adequate workarounds.
I imagine this is yet another use-case for something like Awkward Arrays on the GPU. FYI @gmarkall, and also I hope you don't mind the cc @jpivarski :)
I have this dataformat where the "DATA" column values are list of imaginary numbers. I was trying to store them in similar way in cudf dataframe. I think I found another workaround, which is to convert the complex128 to strings so that cudf reads them as a list of strings and not complex numbers and when I read them back for some operation I convert them back to complex numbers.
Hello @arpan-das-astrophysics you mentioned storing imaginary numbers as floats in #12104 and as strings here in #11983. Would you please share a bit more about the processing steps you would like to apply to the List[complex]
data?
Hello @arpan-das-astrophysics you mentioned storing imaginary numbers as floats in #12104 and as strings here in #11983. Would you please share a bit more about the processing steps you would like to apply to the
List[complex]
data?
Hi Gregory, thank you for looking into this. Initially I was using List[complex] however that is still memory efficient conversion. The best way I found is to cast the complex array into floats of adjacent (real,imag) which in principle shouldn't take any additional memory. I used np.view(float32)
to cast the array into adjacent floats and then tolist()
to store it in the dataframe. However, we are reading this column multiple times for several operations and it is not optimal to cast and recast every time. So it would be great if we can directly store the whole complex array without any conversion.
Hi @GregoryKimball any update on this?
As an example, I have to run a big for loop to extract columns from a data frame where some columns are multidimensional array and some are even with complex numbers. As you can see extracting those columns making the for loop significantly and this is blocking the whole purpose of using cudF dataframe:
Hello, I am trying to store some imaginary numbers as a cudf dataframe column. Each column cell is a list of imaginary numbers. I am wondering what would be the most efficient way to do it as cudf dataframe doesn't support imaginary numbers? Seperating the real and imaginary part is what I am doing now but this is a huge dataset and it is taking a lot of time.