neuro-inc / flow-template

Apolo Platform Flow Template
Apache License 2.0
7 stars 1 forks source link

Extend Makefile to run most commands locally #99

Closed atemate closed 3 years ago

atemate commented 5 years ago

When I was working with the tutorial https://neu.ro/docs/how_to_train_your_model, I got some problem with my setup:

[ay@archlinux nlp-from-scratch]$ make training 
neuro run \
    --name training-nlp-from-scratch \
    --preset cpu-small \
    --volume storage:nlp-from-scratch/data:/project/data:ro \
    --volume storage:nlp-from-scratch/rnn:/project/rnn:ro \
    --volume storage:nlp-from-scratch/results:/project/results:rw \
    --env PLATFORMAPI_SERVICE_HOST="." \
    image:neuromation-nlp-from-scratch \
    "python -u /project/rnn/char_rnn_classification_tutorial.py"
Job ID: job-24f6b127-5680-4234-a6a7-c387eb0f3640 Status: pending
Name: training-nlp-from-scratch
Http URL: https://training-nlp-from-scratch--artemyushkovskiy.jobs-staging.neu.ro
Shortcuts:
  neuro status training-nlp-from-scratch  # check job status
  neuro logs training-nlp-from-scratch    # monitor job stdout
  neuro top training-nlp-from-scratch     # display real-time job telemetry
  neuro kill training-nlp-from-scratch    # kill job
Status: pending Initializing
Status: pending ContainerCreating
Status: failed Error (Server listening on 0.0.0.0 port 22.  Server listening on :: port 22.  [] Slusarski Traceback (most recent call last):   File "/project/rnn/char_rnn_classification_tutorial.py", line 122, in <module>     print(category_lines['Italian'][:5]) KeyError: 'Italian' )
Terminal is attached to the remote job, so you receive the job's output.
Use 'Ctrl-C' to detach (it will NOT terminate the job), or restart the job
with `--detach` option.

Server listening on 0.0.0.0 port 22.
Server listening on :: port 22.
[]
Slusarski
Traceback (most recent call last):
  File "/project/rnn/char_rnn_classification_tutorial.py", line 122, in <module>
    print(category_lines['Italian'][:5])
KeyError: 'Italian'

So at this point I'd like to test my project setup (paths stored in variables in Makefile) locally. Somehting like make training-local would be useful to debug the project without copying anything to the Storage, waiting for the job to be scheduled, etc.

mariyadavydova commented 5 years ago

This is a dubious feature. The idea of the Makefile is to help working with a Platform (and encourage people to work with a platform).

In your particular case, the problem should be solved by remote debug instead of running the code locally. Also, you may easily get different setups locally and remotely, for example, forgetting to upload the data to the platform storage. Thus, local execution won't help you to troubleshoot the remote problem.

atemate commented 5 years ago

when something does not work, in order to localize the problem, I usually reduce the number of places in the environment where the error may occur. Once I got the exception above, I (of course) checked my directories on Storage (everything's fine), so my next step is to try to run the same setup locally just to check if the problem is with my code or with platform setup. As a regular user of the platform, I experience high level of frustration when in order to run my code locally I have to read Makefile, resolve environment variables manually and run jupyter manually as well. I really need the template to have targets to run my code locally.

YevheniiSemendiak commented 3 years ago

outdated, this feature request depends on each particular case, which includes:

  1. data location
  2. system dependencies
  3. hardware dependencies
  4. other jobs running on a platofrm (tensorflow / MLFlow / etc)

closing for now