Open HenryLeongStat opened 6 years ago
Hi @BoPeng @HenryLeongStat ,
Any progress on this? We're having issues with SoS Julia right now where we can't even move a 2D array from Julia to Python, we can only move single-valued numbers. We can provide an example notebook if that is helpful.
Even a 1D array is hanging for us (e.g. (1000,)
)
A sample notebook would certainly help. My problem is that I have zero knowledge on Julia so I can fix it if it is a bug, but would have to ask for help, perhaps from @mathieuboudreau if more Julia side programming is needed.
I'm not experience in Julia either, mostly MATLAB. haha. But we are trying to do a simple thing with Julia+ SoS, basically processing someone elses script in Julia and then plotting it in Python using Plotly. I'll get back to you with an example notebook - Julia has a few quirks.
Quick note:
When casted as Array{int64,2}
(column ordered array i.e. [1 2 3 4 5]), data transfer between Julia-->Python hangs, no matter how long the array is.
When casted as Array{int64,1}
(row ordered array i,e, [1,2,3,4,5]) it works.
^Quirks that I was talking about haha.
Here is a notebook with a few examples, running on MyBinder: https://mybinder.org/v2/gh/mathieuboudreau/PhaseUnwrapping_book/julia_debug?filepath=jupyter-sos-bugs-example.ipynb
Looks like SoS can deal with Julia 1D arrays if they are columns but not rows (which might also explain why 2D arrays don't work either.)
OK, It was bug so I was able to fix it and added more tests. Since I have learned a bit more on Julia (by watching a youtube video), let me see if I can add more tests for more data types.
I have released sos-julia 0.18.3 with the fix. I did find some types that cannot be transferred from Julia to SoS, for example, mpg
got from a pyarrow dataframe
%get mtcars --from R
mpg = mtcars["mgs"]
is of type 32-element Arrow.Primitive{Float64}
, which I have no idea how to detect and send.
Anyway, let us fix sos-julia
step by step and worry about these later. Please feel free to submit new tickets for types that do not work.
Thank you so much @BoPeng, @zelenkastiot, can you give it a try with sos-julia
version 0.18.3
and see how it works for you? Then we can discuss in the next meeting.
Hi @BoPeng,
Unfortunately, your recent version has not resolved our issue yet. You can test it here: https://mybinder.org/v2/gh/mathieuboudreau/PhaseUnwrapping_book/julia_debug?filepath=jupyter-sos-bugs-example.ipynb
I dug into your code a bit, and found some issues
[1 2]
example (as opposed to e.g. [1, 2]
) is because this case needs use DataFrames.while True
condition. I think the reason why it loops infinitely is because DataFrames isn't loaded succesfully.import Pkg
before calling Pkg.add()
, so this needs to be added to each of the catches.
Even with all the fixes mentioned above, I still get an error, and that's where I'm currently stuck. I just get an error saying Failed to evaluate '"SOS_JULIA_REQUIRE:dataframes"': invalid syntax (<string>, line 1) Unrecognized return value of type type for action %put
which I don't know how to debug.
Hope some of this info helps.
I dug in a bit deeper, and it appears that error occurs during this call:
where expr
is SOS_JULIA_REQUIRE:dataframes
for the example Julia variable I gave above (i.e. a = [1 2]
). Not sure how to resolved this one, since I'm not sure what your logic of this functionality is supposed to be.
What OS are you using? Linux? The import part was VERY SLOW in Julia 0.6.3. If I did not install DataFrame
etc in advance and let sos-julia
install it, the jupyter kernel on Travis would actually time out ... in the end we had to use the loop that you have seen and pre-install the packages in travis. We might be able to remove all these if this has been improved in Julia 1.x.
@BoPeng testing locally on my MacBook, but Linux when using MyBinder (in a Docker container).
I totally get that installing it ahead of time makes sense, all I was pointing out was that you're going to miss those edge cases during your tests since you're preinstalling those packages. Maybe you could instead move them to the setup.py of the sos-julia
library, to guarantee that users do so? If that's even possible? Nonetheless, the bug because of the lack of import Pkg
was missed here, and should likely be fixed regardless.
Yes, I agree and import Pkg
is added.
@mathieuboudreau I have uploaded sos-julia 0.18.5 because the missing using Pkg
should be the reason for the infinite loop on your end. It also fixes the dataframe issue on CentOS.
The dataframe issue is a bit complicated. Whereas the Python feather-format
package has been writing in Feather V2 (ARROW) format, Feather.jl
can only read the V1 version (#20). The situation should be resolved "in a few weeks" according to Feathre.jl
developer. Because newer version of feather_format.write_dataframe
allows an option version=1
, I forced the use of version=1
, which temporarily fixes the problem.
Ok great, thanks Bo! I tested your update on your master branch this morning and it resolved my issue for 2D arrays - thank you! Multidimensional arrays of 3 or more dimensions still don't work, but that's not a problem for us (just mentioning it since that was the original topic of this issue, which I think should be kept open?).
I'll test it with the new pypi version and let you know if that one works too for me!
Yeah, feather
is for 2d dataframe and does not work for higher dimentional arrays. It could work without feather but then some coding in Julia would be needed to create numpy-equivalent expressions, which I am now not capable of.
No worries! Now that I've gone beyond simply using SoS to actually looking under the hood and modifying the code a little bit, it's less intimidating than I previously thought it would be haha, so maybe that's something I'll try to explore if I have the time.
As title... References to https://github.com/vatlab/sos-r/issues/1