guanquann / Stocksera

Finance application that provides more than 60 different alternative data to retail investors
MIT License
661 stars 107 forks source link

Docker image has Java error #20

Closed CodeInFilth closed 2 years ago

CodeInFilth commented 3 years ago
scheduled_tasks_1  | Traceback (most recent call last):
scheduled_tasks_1  |   File "/usr/local/lib/python3.7/site-packages/tabula/io.py", line 85, in _run
scheduled_tasks_1  |     check=True,
scheduled_tasks_1  |   File "/usr/local/lib/python3.7/subprocess.py", line 488, in run
scheduled_tasks_1  |     with Popen(*popenargs, **kwargs) as process:
scheduled_tasks_1  |   File "/usr/local/lib/python3.7/subprocess.py", line 800, in __init__
scheduled_tasks_1  |     restore_signals, start_new_session)
scheduled_tasks_1  |   File "/usr/local/lib/python3.7/subprocess.py", line 1551, in _execute_child
scheduled_tasks_1  |     raise child_exception_type(errno_num, err_msg, err_filename)
scheduled_tasks_1  | FileNotFoundError: [Errno 2] No such file or directory: 'java': 'java'
scheduled_tasks_1  | 
scheduled_tasks_1  | During handling of the above exception, another exception occurred:
scheduled_tasks_1  | 
scheduled_tasks_1  | Traceback (most recent call last):
scheduled_tasks_1  |   File "tasks_to_run.py", line 158, in <module>
scheduled_tasks_1  |     get_upcoming_events_date.main()
scheduled_tasks_1  |   File "/code/scheduled_tasks/economy/get_upcoming_events_date.py", line 69, in main
scheduled_tasks_1  |     retail_df = get_next_retail_sales_date()
scheduled_tasks_1  |   File "/code/scheduled_tasks/economy/get_upcoming_events_date.py", line 20, in get_next_retail_sales_date
scheduled_tasks_1  |     df = tabula.read_pdf(r"https://www.census.gov/retail/marts/www/martsdates.pdf", pages=1)[0]
scheduled_tasks_1  |   File "/usr/local/lib/python3.7/site-packages/tabula/io.py", line 322, in read_pdf
scheduled_tasks_1  |     output = _run(java_options, kwargs, path, encoding)
scheduled_tasks_1  |   File "/usr/local/lib/python3.7/site-packages/tabula/io.py", line 91, in _run
scheduled_tasks_1  |     raise JavaNotFoundError(JAVA_NOT_FOUND_ERROR)
scheduled_tasks_1  | tabula.errors.JavaNotFoundError: `java` command is not found from this Python process.Please ensure Java is installed and PATH is set for `java`

So I am trying to implement your Docker image on all of my OS, now that I have run through your commit 94 main script on Ubuntu 18.04 Server, Ubuntu 20.04 Desktop, Mac OS 11.14, and Windows 10.

I have been hitting this error on the final line after finishing the tasks file you compiled.

I was able to fix the initial nltk error without a bash connection into the image, I have only added the RUN after COPY . /code/

FROM python:3.7
ENV PYTHONUNBUFFERED 1
RUN mkdir /code
WORKDIR /code
COPY requirements.txt /code/
RUN pip install -r requirements.txt
COPY . /code/
RUN mkdir -p /usr/share/nltk_data \
    && cd /usr/share/nltk_data \
    && mkdir -p sentiment corpora \
    && curl https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/corpora/stopwords.zip > corpora/stopwords.zip \
    && curl https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/sentiment/vader_lexicon.zip > sentiment/vader_lexicon.zip

Do you have a different docker image with open-jdk you ran this one on ?

If not I will post my solution regardless.

CodeInFilth commented 3 years ago

Here is my fix for running Stocksera on a python docker with Java and Python in the same image (on 3.7 it was not easy to find)

FROM python:3.7
ENV PYTHONUNBUFFERED 1
RUN mkdir /code
WORKDIR /code
COPY requirements.txt /code/
RUN pip install -r requirements.txt
RUN pip install django-sslserver
RUN pip install tabula-py

RUN apt-get update \
    && apt-get install -y openjdk-11-jdk \
    && apt-get install -y ant \
    && apt-get clean;

RUN mkdir -p /usr/share/nltk_data \
    && cd /usr/share/nltk_data \
    && mkdir -p sentiment corpora \
    && curl https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/corpora/stopwords.zip > corpora/stopwords.zip \
    && curl https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/sentiment/vader_lexicon.zip > sentiment/vader_lexicon.zip

COPY . /code/

When running python3 scheduled_tasks/get_house_trading.py and receiving an error , changing the following should fix it.

Changing line 5 for

  3    df = pd.read_json("https://house-stock-watcher-data.s3-us-west-2.amazonaws.com/data/all_transactions.json")
  4    for i in ["transaction_date", "disclosure_date"]:
- 5        df[i] = pd.to_datetime(df[i])
  6        df[i] = df[i].dt.strftime('%Y-%m-%d')
  7    df.to_csv("database/government/house.csv", index=False)
+ 5        df[i] = pd.to_datetime(df[i], errors = 'coerce')
CodeInFilth commented 2 years ago

I have fixed your docker file with this script. I allows the docker system to be turnkey and not need as much knowledege for inexperienced users.

Dockerfile:

01    FROM python:3.7
02    ENV PYTHONUNBUFFERED 1
03    
04    RUN mkdir /code
05    WORKDIR /code
06    COPY requirements.txt /code/
07    RUN pip install -r requirements.txt
08    RUN pip install django-sslserver
09    RUN pip install tabula-py
10    RUN pip install hickory
11    
12    RUN apt-get update \
13        && apt-get install -y openjdk-11-jdk \
14        && apt-get install -y ant \
15        && apt-get clean;
16        
17    RUN mkdir -p /usr/share/nltk_data \
18        && cd /usr/share/nltk_data \
19        && mkdir -p sentiment corpora \
20        && curl https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/corpora/stopwords.zip > corpora/stopwords.zip \
21        && curl https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/sentiment/vader_lexicon.zip > sentiment/vader_lexicon.zip
22        
23    COPY . /code/
guanquann commented 2 years ago

Thank you. I will include this in my next commit. Been pretty busy recently, so I didn't have time to fix it :(