vatlab / sos-julia

SoS extension for Julia
BSD 3-Clause "New" or "Revised" License
0 stars 1 forks source link

Support for multi-dimension array #4

Open HenryLeongStat opened 6 years ago

HenryLeongStat commented 6 years ago

As title... References to https://github.com/vatlab/sos-r/issues/1

mathieuboudreau commented 4 years ago

Hi @BoPeng @HenryLeongStat ,

Any progress on this? We're having issues with SoS Julia right now where we can't even move a 2D array from Julia to Python, we can only move single-valued numbers. We can provide an example notebook if that is helpful.

mathieuboudreau commented 4 years ago

Even a 1D array is hanging for us (e.g. (1000,) )

BoPeng commented 4 years ago

A sample notebook would certainly help. My problem is that I have zero knowledge on Julia so I can fix it if it is a bug, but would have to ask for help, perhaps from @mathieuboudreau if more Julia side programming is needed.

mathieuboudreau commented 4 years ago

I'm not experience in Julia either, mostly MATLAB. haha. But we are trying to do a simple thing with Julia+ SoS, basically processing someone elses script in Julia and then plotting it in Python using Plotly. I'll get back to you with an example notebook - Julia has a few quirks.

agahkarakuzu commented 4 years ago

Quick note:

When casted as Array{int64,2} (column ordered array i.e. [1 2 3 4 5]), data transfer between Julia-->Python hangs, no matter how long the array is.

When casted as Array{int64,1} (row ordered array i,e, [1,2,3,4,5]) it works.

mathieuboudreau commented 4 years ago

^Quirks that I was talking about haha.

mathieuboudreau commented 4 years ago

Here is a notebook with a few examples, running on MyBinder: https://mybinder.org/v2/gh/mathieuboudreau/PhaseUnwrapping_book/julia_debug?filepath=jupyter-sos-bugs-example.ipynb

Looks like SoS can deal with Julia 1D arrays if they are columns but not rows (which might also explain why 2D arrays don't work either.)

BoPeng commented 4 years ago

OK, It was bug so I was able to fix it and added more tests. Since I have learned a bit more on Julia (by watching a youtube video), let me see if I can add more tests for more data types.

BoPeng commented 4 years ago

I have released sos-julia 0.18.3 with the fix. I did find some types that cannot be transferred from Julia to SoS, for example, mpg got from a pyarrow dataframe

%get mtcars --from R
mpg = mtcars["mgs"]

is of type 32-element Arrow.Primitive{Float64}, which I have no idea how to detect and send.

Anyway, let us fix sos-julia step by step and worry about these later. Please feel free to submit new tickets for types that do not work.

agahkarakuzu commented 4 years ago

Thank you so much @BoPeng, @zelenkastiot, can you give it a try with sos-julia version 0.18.3 and see how it works for you? Then we can discuss in the next meeting.

mathieuboudreau commented 4 years ago

Hi @BoPeng,

Unfortunately, your recent version has not resolved our issue yet. You can test it here: https://mybinder.org/v2/gh/mathieuboudreau/PhaseUnwrapping_book/julia_debug?filepath=jupyter-sos-bugs-example.ipynb

I dug into your code a bit, and found some issues

Even with all the fixes mentioned above, I still get an error, and that's where I'm currently stuck. I just get an error saying Failed to evaluate '"SOS_JULIA_REQUIRE:dataframes"': invalid syntax (<string>, line 1) Unrecognized return value of type type for action %put which I don't know how to debug.

Hope some of this info helps.

mathieuboudreau commented 4 years ago

I dug in a bit deeper, and it appears that error occurs during this call:

https://github.com/vatlab/sos-julia/blob/03b6eadcc5d33fd9bfb14098d353d7c387eaa2d8/src/sos_julia/kernel.py#L388

where expr is SOS_JULIA_REQUIRE:dataframes for the example Julia variable I gave above (i.e. a = [1 2]). Not sure how to resolved this one, since I'm not sure what your logic of this functionality is supposed to be.

BoPeng commented 4 years ago

What OS are you using? Linux? The import part was VERY SLOW in Julia 0.6.3. If I did not install DataFrame etc in advance and let sos-julia install it, the jupyter kernel on Travis would actually time out ... in the end we had to use the loop that you have seen and pre-install the packages in travis. We might be able to remove all these if this has been improved in Julia 1.x.

mathieuboudreau commented 4 years ago

@BoPeng testing locally on my MacBook, but Linux when using MyBinder (in a Docker container).

I totally get that installing it ahead of time makes sense, all I was pointing out was that you're going to miss those edge cases during your tests since you're preinstalling those packages. Maybe you could instead move them to the setup.py of the sos-julia library, to guarantee that users do so? If that's even possible? Nonetheless, the bug because of the lack of import Pkg was missed here, and should likely be fixed regardless.

BoPeng commented 4 years ago

Yes, I agree and import Pkg is added.

BoPeng commented 4 years ago

@mathieuboudreau I have uploaded sos-julia 0.18.5 because the missing using Pkg should be the reason for the infinite loop on your end. It also fixes the dataframe issue on CentOS.

The dataframe issue is a bit complicated. Whereas the Python feather-format package has been writing in Feather V2 (ARROW) format, Feather.jl can only read the V1 version (#20). The situation should be resolved "in a few weeks" according to Feathre.jl developer. Because newer version of feather_format.write_dataframe allows an option version=1, I forced the use of version=1, which temporarily fixes the problem.

mathieuboudreau commented 4 years ago

Ok great, thanks Bo! I tested your update on your master branch this morning and it resolved my issue for 2D arrays - thank you! Multidimensional arrays of 3 or more dimensions still don't work, but that's not a problem for us (just mentioning it since that was the original topic of this issue, which I think should be kept open?).

I'll test it with the new pypi version and let you know if that one works too for me!

BoPeng commented 4 years ago

Yeah, feather is for 2d dataframe and does not work for higher dimentional arrays. It could work without feather but then some coding in Julia would be needed to create numpy-equivalent expressions, which I am now not capable of.

mathieuboudreau commented 4 years ago

No worries! Now that I've gone beyond simply using SoS to actually looking under the hood and modifying the code a little bit, it's less intimidating than I previously thought it would be haha, so maybe that's something I'll try to explore if I have the time.