Closed daegloe closed 7 years ago
The second last line is asking you to enter text on the keyboard but since you are not there to do so, it is dying on no input from the user.
Is there a way to echo "y" into this process or run it with --yes or --force, etc.?
interesting — can you try downgrading your nltk package to an earlier version that doesn't ask?
we may be able to use expect
to automate this, if there isn't a flag for automated installs.
Thanks for following up.
If you look closely at the log snippet, the install fails and then prompts to retry. The question is why does the install fail on Heroku, but succeed locally?
FWIW, we haven't upgraded our NLTK version. This was working with the build pack until recently.
Could an infrastructure change be the culprit?
It seems to be ignoring the line breaks when reading the packages listed in the nltk.txt
file.
@daegloe are you on windows?
No, Mac OS Sierra.
The buildpack converts nltk.txt
to the space separated package list using:
https://github.com/heroku/heroku-buildpack-python/blob/fd360bda149687536b26ac413185a6c2b5953bca/bin/steps/nltk#L24
Is your file using CRLFs or CRs instead?
What does tr "\n" " " < nltk.txt
show locally?
Whilst the Error installing package. Retry? [n/y/e]
prompt seen in the OP isn't the cause of the problem, it is confusing and also means that retries won't be happening in the case of transient network issues.
It turns out it can be suppressed by using the quiet
option, however that means losing out on the standard log output. Ideally if no TTY is attached the prompt would be skipped - so I've filed nltk/nltk#1825.
@edmorley Would https://github.com/heroku/heroku-buildpack-python/issues/444#issuecomment-320466015 work?
No since in this case the package names being passed to the downloader are invalid, so retrying wouldn't help.
@edmorley thanks for filing that ticket!
Ah actually the problem isn't due to the line endings, it's that the buildpack passes the package list quoted. Doing so works when there is only one package, but not when there are several.
Compare:
$ python2 -m nltk.downloader -d nltk_data "wordnet brown punkt averaged_perceptron_tagger"
[nltk_data] Error loading wordnet brown punkt
[nltk_data] averaged_perceptron_tagger: Package 'wordnet brown
[nltk_data] punkt averaged_perceptron_tagger' not found in index
Error installing package. Retry? [n/y/e]
to:
$ python2 -m nltk.downloader -d nltk_data wordnet brown punkt averaged_perceptron_tagger
[nltk_data] Downloading package wordnet to nltk_data...
...
To fix, the double quotes added by #438 need to be removed.
I've opened #460 which will fix this. In the meantime you could roll back to release v110 (whose git tag is unfortunately missing, #448) or else v109 using the syntax here: https://devcenter.heroku.com/articles/buildpacks#buildpack-references
(If you choose to do that, I'd recommend switching back to the heroku/python
buildpack alias later, so you continue to receive updates)
@daegloe could you confirm that #460 fixed this? It's present in any tagged version from 114 onwards, and also via the heroku/python
stable version alias.
yes, all good now. thanks @edmorley!
✨🍰✨
Hey, I'm having a very similar issue to the one originally described here.
-----> Python app detected -----> Installing requirements with pip -----> Downloading NLTK corpora… -----> Downloading NLTK packages: punkt /app/.heroku/python/lib/python3.6/runpy.py:125: RuntimeWarning: 'nltk.downloader' found in sys.modules after import of package 'nltk', but prior to execution of 'nltk.downloader'; this may result in unpredictable behaviour warn(RuntimeWarning(msg)) Traceback (most recent call last): File "/app/.heroku/python/lib/python3.6/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/app/.heroku/python/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/app/.heroku/python/lib/python3.6/site-packages/nltk/downloader.py", line 2272, in <module> [nltk_data] Error loading punkt : Package 'punkt\r' not found in index Error installing package. Retry? [n/y/e] halt_on_error=options.halt_on_error) File "/app/.heroku/python/lib/python3.6/site-packages/nltk/downloader.py", line 681, in download choice = input().strip() EOFError: EOF when reading a line -----> Discovering process types Procfile declares types -> worker -----> Compressing... Done: 47.2M -----> Launching... Released v42 https://phoenix-discordbot-test.herokuapp.com/ deployed to Heroku
For background, I'm using nltk for TextBlob on Heroku. I have a nltk.txt
file in my app folder that I'm using in Dropbox, and I specified both nltk
and textblob
in my requirements.txt
file. I'm not super knowledgeable about this, but if someone could help me out it would be much appreciated.
The corpora that I want to be installed are:
punkt brown wordnet averaged_perceptron_tagger conll2000 movie_reviews
and I have listed these in my nltk.txt
file.
It looks like your nltk.txt
has windows line endings, which aren't currently handled gracefully by this buildpack's nltk step. I'd recommend converting it to unix endings (and for the rest of the repository too; generally windows line endings are not recommended since they break bash and others). See:
https://help.github.com/articles/dealing-with-line-endings/
Hi, I am sorry to respond to a closed issue. However, I got an issue similar to rchsmohsin while only use 1 NLTK corpus.
Downloading NLTK corpora…
remote: -----> Downloading NLTK packages: averaged_perceptron_tagger
remote: /app/.heroku/python/lib/python3.6/runpy.py:125: RuntimeWarning: 'nltk.downloader' found in sys.modules after import of package 'nltk', but prior to execution of 'nltk.downloader'; this may result in unpredictable behaviour
remote: warn(RuntimeWarning(msg))
remote: [nltk_data] Downloading package averaged_perceptron_tagger to /tmp/bui
remote: [nltk_data] ld_dcb72ad4d4a6e9c82f1c784207ff830e/.heroku/python/nlt
remote: [nltk_data] k_data...
remote: [nltk_data] Package averaged_perceptron_tagger is already up-to-
remote: [nltk_data] date!
For background, I am creating a word cloud without verbs so that only 'averaged_perceptron_tagger' in NLTK is needed. The word cloud image doesn't show up on the app so that I am wondering this error may be the reason.
Sorry for commenting on the closed issue. I'm still facing the issue with nltk installation onto heroku.
@daegloe , @YaxianLin what is exact fix for this problem?
Sorry for commenting on the closed issue. I'm still facing the issue with nltk installation onto heroku.
@daegloe , @YaxianLin what is exact fix for this problem?
did you find the solution?
When I deploy python web to heroku, it have error, could you help me?
-----> Downloading NLTK corpora… ! 'nltk.txt' not found, not downloading any corpora ! Learn more: https://devcenter.heroku.com/articles/python-nltk -----> Discovering process types Procfile declares types -> web -----> Compressing... ! Compiled slug size: 1.4G is too large (max is 500M). ! See: http://devcenter.heroku.com/articles/slug-size ! Push failed
I have made a stopwords.csv and using in my code
import nltk
import pandas as pd
nltk.download('stopwords')
from nltk.corpus import stopwords
stopword = pd.DataFrame()
stopword['text'] = stopwords.words('english')
stopword.to_csv("stopwords.csv")
but this is solution for only stopwords as I am facing issue of
[nltk_data] /tmp/build_3863ea7f/.heroku/python/nltk_data... [nltk_data] Unzipping corpora/stopwords.zip. -----> Discovering process types Procfile declares types -> web -----> Compressing... ! Compiled slug size: 644.5M is too large (max is 500M). ! See: http://devcenter.heroku.com/articles/slug-size ! Push failed
Hi, we are no longer able to download and install the NLTK corpora via this build pack.
We receive the deployment error below, no matter the packages listed in the
nltk.txt
file: