Closed rezaeir closed 4 years ago
Your particular example works for me locally and on our live server sosworkflows.com
There might be something simple missing.
Could you please check against the following lists?
@rezaeir I saw a notification on gitter, let us work on your problem one by one.
I noticed that sos-r
is lagging behind on conda-forge, I just updated it.
@BoPeng I deleted my gitter message because I thought here is the right place to ask it!!! I just ran your requested commands and here is the result:
conda list | grep feather
:
feather-format 0.4.1 pyh9f0ad1d_0 conda-forge
r-feather 0.3.3 r36h6115d3f_0 r
conda list | grep sos
:
jupyterlab-sos 0.5.3 py_0 conda-forge
sos 0.21.11 py_0 conda-forge
sos-bash 0.20.0 py_0 conda-forge
sos-notebook 0.21.7 py37_1 conda-forge
sos-papermill 0.1.6 py_0 conda-forge
sos-python 0.18.4 py_0 conda-forge
sos-r 0.19.3 py_0 conda-forge
It is a bit strange because you have everything needed. sos-notebook
and sos-r
are not the latest but newer than what are used on sosworkflows.com.
So this is likely a windows-specific problem and I will have to find a windows machine to test it. I have something due today but will get back to this as soon as I can.
@BoPeng As I told you in the first message, I also thought that the problem is with windows. So, I installed everything both with condo and without conda in WSL2 and the problem wasn't solved!!
ok, I can reproduce the problem on windows.. checking what is going on.
@BoPeng I ran other examples from sos notebook docs and variables with basic python types like list or string are imported to R with no problem. The problem is just for importing pandas dataframes into R. getting R built-in mtcars dataframe into SoS also works with no problem.
When I run the %put df --to R in wsl2, the df doesn't come into R and jupyer log gives me the following error:
Notebook JSON is invalid: Additional properties are not allowed ('execution_count', 'status' were unexpected)
Failed validating 'additionalProperties' in error:
On instance['cells'][5]['outputs'][0]:
{'ename': 'ExecuteError',
'evalue': '[0]: \n'
'----------------------------------------------------------...',
'execution_count': 13,
'output_type': 'error',
'status': 'error',
'traceback': ['\x1b[91m[0]: \n'
'-----------------------------------------------------...']}
I also tried a numpy array and there is no problem. The only data structure which doesn't work is pandas dataframe!!
There is also no problem with importing pandas series in R.
This is caused by incompatibility between the version of feather library used in Python and R for this particular installation.
The code boils down to
import pandas as pd
df = pd.DataFrame({'num_legs': [2, 4, 8, 0],
'num_wings': [2, 0, 0, 0],
'num_specimen_seen': [10, 2, 1, 8]},
index=['falcon', 'dog', 'spider', 'fish'])
df = df.reset_index()
df.to_feather('test.feather')
in Python and
library('feather')
read_feather('test.feather')
in R, which says "not a feather file".
I have not figured out which side is to blame though.
I copies the feather file to mac and could not read it from mac using pandas.read_feather
, so there is something suspicious with the pandas.to_feather
function, which uses feather.write_dataframe
.
I downgraded both pandas in python and feather in r to exactly the same versions in sosworkflow.com but it still says "not a feather file" while the same version of packages in the website jlab outputs the dataframe and doesn't give me an error.
I believe I found the problem root. I searched for similar problem with feather outside SoS and I found this issue which one of the comments from 9 days ago says that:
Could you please try with arrow::read_feather()?
and then I tried using arrow
package in r to read the feather file and it works fine. I also tested it on another dataframe and it worked fine.
another comment from the same issue says that:
If I now understand, the R feather package is superceeded by arrow.
I guess you should change the source code from r-feather to r-arrow to fix this issue!!
I found another person having the same problem with feather package in r and getting the same error in this StackOverflow question.
based on the comment on this question:
Development on feather in R moved to arrow, so it's probably a versioning thing. But you might consider using parquet, which is part of arrow as well and which is well-supported by a broader range of languages
Thanks @rezaeir . That is a lot of detective work. I will try to use arrow in sos-r and make a new release as soon as tests pass.
I heard of parquet, but looked elsewhere when I saw the Hadoop part of "Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem". Perhaps I should re-visit.
Thank you @BoPeng, I'm looking forward to testing the new update when it is ready.
unfortunately it appears that the arrow
package has its own problem:. Not sure how it works on your side (windows), I am getting
Whereas on sosworkflows.com using feather the types are forced to double but at least the matrix looks all right.
So it appears to me that arrow
introduces a integer64
datatype that is not understood by R.
I just checked your code in windows and I get the same result when using data.matrix(df)
which is not good!!!
There should be some way to fix this! I'll search to see if others had the same problem, because apparently feather is being replaced by arrow. so, there should be some solution for this. If I couldn't find the solution, I open an issue for arrow and see what its developers think!!
It is weird!! because this stackoverflow qustion says that:
And as those involved focus now on arrow which appears to have 64-bit integer support, you most likely will just be asked to move to arrow
I couldn't find any solution by searching, so I opened an issue in here and will report the result if they give me any solution!
one of the arrow package team members responded to my issue and their reasoning seems legitimate!! Please see the issue whenever you have time.
ok, sos is sort of the "buffer" between languages and should absorb these kind of errors as much as possible. Given that R barely handles integer64
(I never heard of it as a somewhat experienced R user), I suggest that we do the minmax
thing on the dataframe we get, which basically means we should enhance
to handle the type conversion, namely calling as.integer
after checking the type and range of each column.
@rezaeir could you provide a few lines of code for this?
I tried to write as little as possible with a concise syntax. Sorry if it is not good enough because I'm not an experienced programmer. I checked the final result in windows and it did work. I hope it solves the problem
Linux based tests still fail due to incompatibility of pyarrow (#23), but it is likely a problem with the travis CI environment. I have made a new release since tests pass on windows and mac.
Thank you, I'm looking forward to test the new release in my environment.
I've recently started to use sos to combine R and python code. However, the example in the website doesn't seem to work for me.
If I want to get variables from R it works and there is no problem:
output
is1 [1, 2]
as expected. However, if I try to do the reverse, getting the variable from SoS and put it in R, it won't work and gives me the following output. Code:in this case the output should be the dataframe but instead it is like this:
and when I run
ls()
in R to see variables, there is no variable.I installed SoS using conda in windows and the problem was there. Then, I thought maybe the problem is windows and installed it in WSL2-ubuntu with conda but it didn't fix it. Then I deleted conda from the workflow and installed everything from scratch in a python environment, installed some of the required packages with pip and some with apt, but nothing changed and I still get the same output.