metaspace2020 / Lithops-METASPACE

Lithops-based Serverless implementation of the METASPACE spatial metabolomics annotation pipeline
12 stars 4 forks source link

buffer_index out of range exception when running the pipeline on knative #83

Closed JosepSampe closed 4 years ago

JosepSampe commented 4 years ago

I'm trying to run the experiment-1-typical.ipynb on a knative cluster, and I always get the following exception from a function:

[INFO] __main__: PyWren v1.7.2 - Starting Knative execution
[INFO] handler: Execution-ID: 0f4f5a/20/M000/00047
[INFO] JobRunner: Started
[INFO] JobRunner: Going to execute 'generate_formulas()'
---------------------- FUNCTION LOG ----------------------
Generating formulas for adduct +Cd
----------------------- EXCEPTION !-----------------------
Traceback (most recent call last):
  File "/pywren/pywren_ibm_cloud/function/jobrunner.py", line 270, in run
    result = function(**data)
  File "/home/cloudbutton/metaspace/pywren-annotation-pipeline/annotation_pipeline/molecular_db.py", line 32, in generate_formulas
  File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 598, in result_iterator
    yield fs.pop().result()
  File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 435, in result
    return self.__get_result()
  File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/cloudbutton/metaspace/pywren-annotation-pipeline/annotation_pipeline/molecular_db.py", line 29, in _get_mols
  File "/tmp/pywren.modules/0f4f5a/20/M000/00047/annotation_pipeline/utils.py", line 195, in read_cloud_object_with_retry
    raise last_exception
  File "/tmp/pywren.modules/0f4f5a/20/M000/00047/annotation_pipeline/utils.py", line 186, in read_cloud_object_with_retry
    data = stream_reader(data_stream)
  File "/tmp/pywren.modules/0f4f5a/20/M000/00047/annotation_pipeline/utils.py", line 155, in deserialise
    return pa.deserialize(data)
  File "pyarrow/serialization.pxi", line 490, in pyarrow.lib.deserialize
  File "pyarrow/serialization.pxi", line 449, in pyarrow.lib.deserialize_from
  File "pyarrow/serialization.pxi", line 422, in pyarrow.lib.read_serialized
  File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
OSError: buffer_index out of range.
----------------------------------------------------------
[INFO] JobRunner: Finished
[INFO] handler: Storing execution stats - Size: 4.3KiB
[INFO] handler: Finished

Could it be something related to an old version of the pyarrow package in the runtime? is this knative Dockerfile updated and working for the recent additions?

omerb01 commented 4 years ago

@JosepSampe Knative Dockerfile was tested for previous PR, seems like there was an issue with deserialisation, I can guess that it is related for some memory limit that you have over testbed, can you check that out and update if it is the case?

JosepSampe commented 4 years ago

Seems that the exception comes from a version mismatch, in the pyarrow module, between the local machine and in the runtime.

For some reason, the pyarrow 0.17.1 module is not working in the machine I'm trying to run the notebook, so I had to update to the latest pyarrow version (1.0.1), both locally and in the runtime, to make it working.