scramjetorg / scramjet

Public tracker for Scramjet Cloud Platform, a platform that bring data from many environments together.
https://www.scramjet.org
MIT License
253 stars 20 forks source link

'Sequence unpack failed' issue #125

Closed tomkeee closed 1 year ago

tomkeee commented 2 years ago

I am trying to run a sequence, that includes the files below. Unfortunatelly, whenever i try to start my sequence (si seq start <instance id> ) I get an error { message: 'Sequence unpack failed', exitcode: 10 }

main.py

from scramjet.streams import Stream
import time
import asyncio

result = 0 

def count(x):
    result += 1

async def run(context,input):
    start = time.time()
    with open(input) as file_in:
        x = (Stream
            .read_from(file_in)
            .map(count)
        )
    execution_time = time.time() - start
    return f"The number of interpreted lines: {result}\nExecution time {execution_time}"

package.json

{
    "name": "@scramjet/big-input-file",
    "version": "0.22.0",
    "lang": "python",
    "main": "./main.py",
    "author": "XYZ",
    "license": "GPL-3.0",
    "engines": {
        "python3": "3.9.0"
    },
    "scripts": {
        "build:refapps": "yarn build:refapps:only",
        "build:refapps:only": "mkdir -p dist/__pypackages__/ && cp *.py dist/ && pip3 install -t dist/__pypackages__/ -r requirements.txt",
        "postbuild:refapps": "yarn prepack && yarn packseq",
        "packseq": "PACKAGES_DIR=python node ../../scripts/packsequence.js",
        "prepack": "PACKAGES_DIR=python node ../../scripts/publish.js",
        "clean": "rm -rf ./dist"
    }
}

requirements.txt scramjet-framework-py

daro1337 commented 2 years ago

Hi @tomkeee thanks for your reporting. How did you pack and send your sequence to our platform? If you used our CLI what version (si -v), Did you used si sequence deploy or si seq pack <package> & si seq send <package>

tomkeee commented 2 years ago

Hi @daro1337, I used si seq pack <package> & si seq send <package> CLI version is 0.28.3

daro1337 commented 2 years ago

@tomkeee Did you try to pack and send sequence again? I have successfully run the sequences from code snippet above, but there were some code erros:

2022-09-16T12:51:51.416Z DEBUG Host Request [
  'date: 2022-09-16T12:51:46.814Z, method: POST, url: /api/v1/sequence/83840e85-c1be-42ce-8906-c19523c368f7/start, status: 200'
]
    return future.result()
  File "//runner.py", line 54, in main
    await self.run_instance(config, args)
  File "//runner.py", line 215, in run_instance
    result = await result
  File "/package/main.py", line 12, in run
    with open(input) as file_in:
TypeError: expected str, bytes or os.PathLike object, not Stream

Can you inspect your package.tar.gz archive?

$ tar -ztvf issue125.tar.gz 
-rw-rw-r-- daro/1000       411 2022-09-16 14:45 main.py
-rw-rw-r-- daro/1000       672 2022-09-16 14:46 package.json
-rw-rw-r-- daro/1000        22 2022-09-16 14:46 requirements.txt

issue125.tar.gz

tomkeee commented 2 years ago

@daro1337 I have no issues with sending the sequence (it actually works). The problem is that I am not able to run the sequence. It fails at si seq start <instance-id>. I tried to use it on my local machine (code below) and it worked as it should

 from scramjet.streams import Stream
import time
import asyncio

result = 0 

def count(x):
    global result
    result += 1

async def run(context,input):
    start = time.time()
    with open(input) as file_in:

        data = file_in.read()
        for i in data:
            count(i)
        global result
        print(f"result form file.read(): {result}")
        result = 0

        x = (Stream
            .read_from(file_in)
            .map(count)
        )
    execution_time = time.time() - start
    return f"The number of characters: {result}\nExecution time {execution_time}"

res =asyncio.run(run({},"new.csv"))
print(res)

The result is:

result from file.read(): 327
The number of characters: 0
Execution time 0.00022649765014648438

I am trying to figure out what goes wrong in starting the sequence at the platform

tomkeee commented 2 years ago

@daro1337 Same issue occurs even when I try to start sequence with the very basic code provided on your github repo - link (as shown below)

hello.py

from scramjet.streams import Stream

def run(context, input):
    return Stream.read_from(input).map(lambda s: f"Hello {s}!")

package.json

{
    "name": "@scramjet/python-hello-python",
    "version": "0.22.0",
    "lang": "python",
    "main": "./hello.py",
    "author": "Jan Warchoł <open-source@scramjet.org>",
    "license": "GPL-3.0",
    "repository": {
        "type": "git",
        "url": "https://github.com/scramjetorg/transform-hub.git"
    },
    "engines": {
        "python3": "3.9.0"
    },
    "scripts": {
        "build:refapps": "yarn build:refapps:only",
        "build:refapps:only": "mkdir -p dist/__pypackages__/ && cp *.py dist/ && pip3 install -t dist/__pypackages__/ -r requirements.txt",
        "postbuild:refapps": "yarn prepack && yarn packseq",
        "packseq": "PACKAGES_DIR=python node ../../scripts/packsequence.js",
        "prepack": "PACKAGES_DIR=python node ../../scripts/publish.js",
        "clean": "rm -rf ./dist"
    }
}

requirements.txt scramjet-framework-py

a-tylenda commented 2 years ago

Hi @tomkeee, I reproduced your case based on our refapp hello. Could you please follow the steps below and let us know if you still get the same error? These steps are exactly what I did and the app worked just fine.

  1. clone reference-apps repo

    git clone git@github.com:scramjetorg/reference-apps.git
  2. Go to hello dir:

    cd python/hello
  3. Install requirements:

    pip3 install -t __pypackages__/ -r requirements.txt

    after this step you should see __pypackages__ in the app directory with its dependencies: image

  4. Now leave hello dir:

    cd ../
  5. Make .tar.gz package:

    si seq pack hello
  6. send hello.tar.gz to the hub (make sure your CLI config is set for beta panel, token added, env set to production, etc.)

    si seq send hello.tar.gz

    the output you should see in the console:

    
    $ si seq send hello.tar.gz

{"_id":"46b9fd3d-4762-4ce6-8b28-c579a9d595f7","host":{"apiBase":"https://api.beta.scramjet.cloud/api/v1/space/org-a10d5cb5-abc4-42c8-8327-4ca53c2e2a05-manager/api/v1/sth/sth-0/api/v1","client":{"apiBase":"https://api.beta.scramjet.cloud/api/v1/space/org-a10d5cb5-abc4-42c8-8327-4ca53c2e2a05-manager/api/v1/sth/sth-0/api/v1","log":{}}},"sequenceURL":"sequence/46b9fd3d-4762-4ce6-8b28-c579a9d595f7"}

7. start the sequence:

si seq start -

the output you should see in the console:

$ si seq start -

{"host":{"apiBase":"https://api.beta.scramjet.cloud/api/v1/space/org-a10d5cb5-abc4-42c8-8327-4ca53c2e2a05-manager/api/v1/sth/sth-0/api/v1","client":{"apiBase":"https://api.beta.scramjet.cloud/api/v1/space/org-a10d5cb5-abc4-42c8-8327-4ca53c2e2a05-manager/api/v1/sth/sth-0/api/v1","log":{}}},"_id":"1145e550-6c5b-430d-b62b-f00bffc5f9f9","instanceURL":"instance/1145e550-6c5b-430d-b62b-f00bffc5f9f9"}

8. get instance's output stream:

si inst output -

the request stays open, the instance awaits some input.
To deliver input data: 

9. open another terminal and copy the instance's ID from the output above (step 7.) and use it in the command:

si inst input 1145e550-6c5b-430d-b62b-f00bffc5f9f9


hit enter
10. type in some data, for example your nick "tomkeee" and hit enter, you should see `Hello tomkeee` in the first terminal.
![image](https://user-images.githubusercontent.com/61054769/191504148-84c154cf-11e1-46dd-9ef7-0a1a9a331a4a.png)

have fun! 🤞🏼 and please let us know if it worked this time, thank you!
Tatarinho commented 2 years ago

@tomkeee I checked on our platform this simple hello sample, and from my perspective it's working. Can you describe exactly how you running it?

In your sample, don't pack your input file with sequence. You can directly send this file to input of instance with our CLI. si inst input <instance_id> <input_file>. As far as I know, we have some limit for size of sequence itself. Second thing, you don't have to open file with python open and then open hook in with our framework. You can directly call x = Stream.read_from(input). We expect that from .run() function user return Stream, not str, so if you want to see str on output, you can use same structure, as you mentioned in hello sample. return Stream.read_from(input).map(lambda s: f"Hello {s}!").

Please note that map function takes all input data, and map every chunk with lambda function here. So on output you are returning str type. Your count function is not returning anything, so it won't work. If you don't want to modify stream data, you can use .each method.

tomkeee commented 2 years ago

Hi @a-tylenda,

I tried it today and it worked successfully :tada: (I can see that there were some fixes on the platform, so maybe I just had bad timing and was starting the sequences during the maintenance :thinking: ).

MichalCz commented 1 year ago

Hi @tomkeee - that's great to hear. I'll pin this thread for reference. :)