taspinar / twitterscraper

Scrape Twitter for Tweets
MIT License
2.39k stars 579 forks source link

`docker build .` doesn't work #250

Open nukopy opened 4 years ago

nukopy commented 4 years ago

When I executed docker build on local development environment, this command didn't work because of not installing lxml's build dependencies for building.

Therefore, I modified Dockerfile to fix this problem.

Local development environment

$ docker -v
Docker version 19.03.5, build 633a0ea

$ sw_vers
ProductName:    Mac OS X
ProductVersion: 10.15.2
BuildVersion:   19C57

$ system_profiler SPHardwareDataType
    ...
    Hardware Overview:
      Model Name: MacBook Pro
      Model Identifier: MacBookPro16,1
      Processor Name: 8-Core Intel Core i9
      Processor Speed: 2.3 GHz
      Number of Processors: 1
      Total Number of Cores: 8
    ...

Error messages

I executed below:

$ git clone https://github.com/taspinar/twitterscraper.git
$ cd twitterscraper
$ docker build -t twitterscraper .

output

Sending build context to Docker daemon   1.46MB
Step 1/5 : FROM python:3.7-alpine
 ---> 8922d588eec6
Step 2/5 : COPY . /app
 ---> ef1950c03f7e
Step 3/5 : WORKDIR /app
 ---> Running in d583ab62c619
Removing intermediate container d583ab62c619
 ---> 21db47ea1ead
Step 4/5 : RUN python setup.py install
 ---> Running in bb84ce370b50
running install
running bdist_egg
running egg_info
creating twitterscraper.egg-info
writing twitterscraper.egg-info/PKG-INFO
writing dependency_links to twitterscraper.egg-info/dependency_links.txt
writing entry points to twitterscraper.egg-info/entry_points.txt
writing requirements to twitterscraper.egg-info/requires.txt
writing top-level names to twitterscraper.egg-info/top_level.txt
writing manifest file 'twitterscraper.egg-info/SOURCES.txt'
reading manifest file 'twitterscraper.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching 'LICENSE.txt'
warning: no files found matching 'HISTORY.rst'
writing manifest file 'twitterscraper.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
creating build
creating build/lib
creating build/lib/twitterscraper
copying twitterscraper/user.py -> build/lib/twitterscraper
copying twitterscraper/tweet.py -> build/lib/twitterscraper
copying twitterscraper/query.py -> build/lib/twitterscraper
copying twitterscraper/__init__.py -> build/lib/twitterscraper
copying twitterscraper/ts_logger.py -> build/lib/twitterscraper
copying twitterscraper/main.py -> build/lib/twitterscraper
creating build/bdist.linux-x86_64
creating build/bdist.linux-x86_64/egg
creating build/bdist.linux-x86_64/egg/twitterscraper
copying build/lib/twitterscraper/user.py -> build/bdist.linux-x86_64/egg/twitterscraper
copying build/lib/twitterscraper/tweet.py -> build/bdist.linux-x86_64/egg/twitterscraper
copying build/lib/twitterscraper/query.py -> build/bdist.linux-x86_64/egg/twitterscraper
copying build/lib/twitterscraper/__init__.py -> build/bdist.linux-x86_64/egg/twitterscraper
copying build/lib/twitterscraper/ts_logger.py -> build/bdist.linux-x86_64/egg/twitterscraper
copying build/lib/twitterscraper/main.py -> build/bdist.linux-x86_64/egg/twitterscraper
byte-compiling build/bdist.linux-x86_64/egg/twitterscraper/user.py to user.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/twitterscraper/tweet.py to tweet.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/twitterscraper/query.py to query.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/twitterscraper/__init__.py to __init__.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/twitterscraper/ts_logger.py to ts_logger.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/twitterscraper/main.py to main.cpython-37.pyc
creating build/bdist.linux-x86_64/egg/EGG-INFO
copying twitterscraper.egg-info/PKG-INFO -> build/bdist.linux-x86_64/egg/EGG-INFO
copying twitterscraper.egg-info/SOURCES.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying twitterscraper.egg-info/dependency_links.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying twitterscraper.egg-info/entry_points.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying twitterscraper.egg-info/requires.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
zip_safe flag not set; analyzing archive contents...
copying twitterscraper.egg-info/top_level.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
creating dist
creating 'dist/twitterscraper-1.4.0-py3.7.egg' and adding 'build/bdist.linux-x86_64/egg' to it
removing 'build/bdist.linux-x86_64/egg' (and everything under it)
Processing twitterscraper-1.4.0-py3.7.egg
Copying twitterscraper-1.4.0-py3.7.egg to /usr/local/lib/python3.7/site-packages
Adding twitterscraper 1.4.0 to easy-install.pth file
Installing twitterscraper script to /usr/local/bin

Installed /usr/local/lib/python3.7/site-packages/twitterscraper-1.4.0-py3.7.egg
Processing dependencies for twitterscraper==1.4.0
Searching for billiard
Reading https://pypi.org/simple/billiard/
Downloading https://files.pythonhosted.org/packages/9a/bb/2a016ac912fca48e06ff5a662407f3d1681aa47df97fb19feba7cc931ee1/billiard-3.6.1.0-py3-none-any.whl#sha256=01afcb4e7c4fd6480940cfbd4d9edc19d7a7509d6ada533984d0d0f49901ec82
Best match: billiard 3.6.1.0
Processing billiard-3.6.1.0-py3-none-any.whl
Installing billiard-3.6.1.0-py3-none-any.whl to /usr/local/lib/python3.7/site-packages
Adding billiard 3.6.1.0 to easy-install.pth file

Installed /usr/local/lib/python3.7/site-packages/billiard-3.6.1.0-py3.7.egg
Searching for requests
Reading https://pypi.org/simple/requests/
Downloading https://files.pythonhosted.org/packages/51/bd/23c926cd341ea6b7dd0b2a00aba99ae0f828be89d72b2190f27c11d4b7fb/requests-2.22.0-py2.py3-none-any.whl#sha256=9cf5292fcd0f598c671cfc1e0d7d1a7f13bb8085e9a590f48c010551dc6c4b31
Best match: requests 2.22.0
Processing requests-2.22.0-py2.py3-none-any.whl
Installing requests-2.22.0-py2.py3-none-any.whl to /usr/local/lib/python3.7/site-packages
writing requirements to /usr/local/lib/python3.7/site-packages/requests-2.22.0-py3.7.egg/EGG-INFO/requires.txt
Adding requests 2.22.0 to easy-install.pth file

Installed /usr/local/lib/python3.7/site-packages/requests-2.22.0-py3.7.egg
Searching for lxml
Reading https://pypi.org/simple/lxml/
Downloading https://files.pythonhosted.org/packages/e4/19/8dfeef50623892577dc05245093e090bb2bab4c8aed5cad5b03208959563/lxml-4.4.2.tar.gz#sha256=eff69ddbf3ad86375c344339371168640951c302450c5d3e9936e98d6459db06
Best match: lxml 4.4.2
Processing lxml-4.4.2.tar.gz
Writing /tmp/easy_install-fr1khd80/lxml-4.4.2/setup.cfg
Running lxml-4.4.2/setup.py -q bdist_egg --dist-dir /tmp/easy_install-fr1khd80/lxml-4.4.2/egg-dist-tmp-orkblx88
warning: no files found matching '*.html' under directory 'doc'
Building lxml version 4.4.2.
Building without Cython.
ERROR: b'/bin/sh: xslt-config: not found\n'
** make sure the development packages of libxml2 and libxslt are installed **

Using build configuration of libxslt
Compile failed: command 'gcc' failed with exit status 1
*********************************************************************************
Could not find function xmlCheckVersion in library libxml2. Is libxml2 installed?
*********************************************************************************
error: Setup script exited with error: command 'gcc' failed with exit status 1
The command '/bin/sh -c python setup.py install' returned a non-zero code: 1

How to fix it

Accoding to error messages, building lxml failed because lxml needs build dependencies: libxml2, libxslt.

After I modified Dockerfile like below, building image is successfully completed.

FROM python:3.7-alpine
COPY . /app
WORKDIR /app
+
+ RUN apk add --no-cache \
+     gcc \
+     libc-dev \
+     libxml2-dev \
+     libxslt-dev
RUN python setup.py install
CMD ["twitterscraper"]

CAUTION: It takes 2.5 minutes to build the image based on edited Dockerfile.

$ time docker build -t twitterscraper .
docker build -t twitterscraper .  0.09s user 0.06s system 0% cpu 2:31.43 total

I'll make a pull request to fix this issue.

ritvik1512 commented 4 years ago

Agreed, similar problem and the above addition fixed the issue. Thank you!

this-is-r-gaurav commented 4 years ago

Fixed It as in this pull request

When I executed docker build on local development environment, this command didn't work because of not installing lxml's build dependencies for building.

Therefore, I modified Dockerfile to fix this problem.

Local development environment

$ docker -v
Docker version 19.03.5, build 633a0ea

$ sw_vers
ProductName:    Mac OS X
ProductVersion: 10.15.2
BuildVersion:   19C57

$ system_profiler SPHardwareDataType
    ...
    Hardware Overview:
      Model Name: MacBook Pro
      Model Identifier: MacBookPro16,1
      Processor Name: 8-Core Intel Core i9
      Processor Speed: 2.3 GHz
      Number of Processors: 1
      Total Number of Cores: 8
    ...

Error messages

I executed below:

$ git clone https://github.com/taspinar/twitterscraper.git
$ cd twitterscraper
$ docker build -t twitterscraper .

output

Sending build context to Docker daemon   1.46MB
Step 1/5 : FROM python:3.7-alpine
 ---> 8922d588eec6
Step 2/5 : COPY . /app
 ---> ef1950c03f7e
Step 3/5 : WORKDIR /app
 ---> Running in d583ab62c619
Removing intermediate container d583ab62c619
 ---> 21db47ea1ead
Step 4/5 : RUN python setup.py install
 ---> Running in bb84ce370b50
running install
running bdist_egg
running egg_info
creating twitterscraper.egg-info
writing twitterscraper.egg-info/PKG-INFO
writing dependency_links to twitterscraper.egg-info/dependency_links.txt
writing entry points to twitterscraper.egg-info/entry_points.txt
writing requirements to twitterscraper.egg-info/requires.txt
writing top-level names to twitterscraper.egg-info/top_level.txt
writing manifest file 'twitterscraper.egg-info/SOURCES.txt'
reading manifest file 'twitterscraper.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching 'LICENSE.txt'
warning: no files found matching 'HISTORY.rst'
writing manifest file 'twitterscraper.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
creating build
creating build/lib
creating build/lib/twitterscraper
copying twitterscraper/user.py -> build/lib/twitterscraper
copying twitterscraper/tweet.py -> build/lib/twitterscraper
copying twitterscraper/query.py -> build/lib/twitterscraper
copying twitterscraper/__init__.py -> build/lib/twitterscraper
copying twitterscraper/ts_logger.py -> build/lib/twitterscraper
copying twitterscraper/main.py -> build/lib/twitterscraper
creating build/bdist.linux-x86_64
creating build/bdist.linux-x86_64/egg
creating build/bdist.linux-x86_64/egg/twitterscraper
copying build/lib/twitterscraper/user.py -> build/bdist.linux-x86_64/egg/twitterscraper
copying build/lib/twitterscraper/tweet.py -> build/bdist.linux-x86_64/egg/twitterscraper
copying build/lib/twitterscraper/query.py -> build/bdist.linux-x86_64/egg/twitterscraper
copying build/lib/twitterscraper/__init__.py -> build/bdist.linux-x86_64/egg/twitterscraper
copying build/lib/twitterscraper/ts_logger.py -> build/bdist.linux-x86_64/egg/twitterscraper
copying build/lib/twitterscraper/main.py -> build/bdist.linux-x86_64/egg/twitterscraper
byte-compiling build/bdist.linux-x86_64/egg/twitterscraper/user.py to user.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/twitterscraper/tweet.py to tweet.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/twitterscraper/query.py to query.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/twitterscraper/__init__.py to __init__.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/twitterscraper/ts_logger.py to ts_logger.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/twitterscraper/main.py to main.cpython-37.pyc
creating build/bdist.linux-x86_64/egg/EGG-INFO
copying twitterscraper.egg-info/PKG-INFO -> build/bdist.linux-x86_64/egg/EGG-INFO
copying twitterscraper.egg-info/SOURCES.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying twitterscraper.egg-info/dependency_links.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying twitterscraper.egg-info/entry_points.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying twitterscraper.egg-info/requires.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
zip_safe flag not set; analyzing archive contents...
copying twitterscraper.egg-info/top_level.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
creating dist
creating 'dist/twitterscraper-1.4.0-py3.7.egg' and adding 'build/bdist.linux-x86_64/egg' to it
removing 'build/bdist.linux-x86_64/egg' (and everything under it)
Processing twitterscraper-1.4.0-py3.7.egg
Copying twitterscraper-1.4.0-py3.7.egg to /usr/local/lib/python3.7/site-packages
Adding twitterscraper 1.4.0 to easy-install.pth file
Installing twitterscraper script to /usr/local/bin

Installed /usr/local/lib/python3.7/site-packages/twitterscraper-1.4.0-py3.7.egg
Processing dependencies for twitterscraper==1.4.0
Searching for billiard
Reading https://pypi.org/simple/billiard/
Downloading https://files.pythonhosted.org/packages/9a/bb/2a016ac912fca48e06ff5a662407f3d1681aa47df97fb19feba7cc931ee1/billiard-3.6.1.0-py3-none-any.whl#sha256=01afcb4e7c4fd6480940cfbd4d9edc19d7a7509d6ada533984d0d0f49901ec82
Best match: billiard 3.6.1.0
Processing billiard-3.6.1.0-py3-none-any.whl
Installing billiard-3.6.1.0-py3-none-any.whl to /usr/local/lib/python3.7/site-packages
Adding billiard 3.6.1.0 to easy-install.pth file

Installed /usr/local/lib/python3.7/site-packages/billiard-3.6.1.0-py3.7.egg
Searching for requests
Reading https://pypi.org/simple/requests/
Downloading https://files.pythonhosted.org/packages/51/bd/23c926cd341ea6b7dd0b2a00aba99ae0f828be89d72b2190f27c11d4b7fb/requests-2.22.0-py2.py3-none-any.whl#sha256=9cf5292fcd0f598c671cfc1e0d7d1a7f13bb8085e9a590f48c010551dc6c4b31
Best match: requests 2.22.0
Processing requests-2.22.0-py2.py3-none-any.whl
Installing requests-2.22.0-py2.py3-none-any.whl to /usr/local/lib/python3.7/site-packages
writing requirements to /usr/local/lib/python3.7/site-packages/requests-2.22.0-py3.7.egg/EGG-INFO/requires.txt
Adding requests 2.22.0 to easy-install.pth file

Installed /usr/local/lib/python3.7/site-packages/requests-2.22.0-py3.7.egg
Searching for lxml
Reading https://pypi.org/simple/lxml/
Downloading https://files.pythonhosted.org/packages/e4/19/8dfeef50623892577dc05245093e090bb2bab4c8aed5cad5b03208959563/lxml-4.4.2.tar.gz#sha256=eff69ddbf3ad86375c344339371168640951c302450c5d3e9936e98d6459db06
Best match: lxml 4.4.2
Processing lxml-4.4.2.tar.gz
Writing /tmp/easy_install-fr1khd80/lxml-4.4.2/setup.cfg
Running lxml-4.4.2/setup.py -q bdist_egg --dist-dir /tmp/easy_install-fr1khd80/lxml-4.4.2/egg-dist-tmp-orkblx88
warning: no files found matching '*.html' under directory 'doc'
Building lxml version 4.4.2.
Building without Cython.
ERROR: b'/bin/sh: xslt-config: not found\n'
** make sure the development packages of libxml2 and libxslt are installed **

Using build configuration of libxslt
Compile failed: command 'gcc' failed with exit status 1
*********************************************************************************
Could not find function xmlCheckVersion in library libxml2. Is libxml2 installed?
*********************************************************************************
error: Setup script exited with error: command 'gcc' failed with exit status 1
The command '/bin/sh -c python setup.py install' returned a non-zero code: 1

How to fix it

Accoding to error messages, building lxml failed because lxml needs build dependencies: libxml2, libxslt.

After I modified Dockerfile like below, building image is successfully completed.

FROM python:3.7-alpine
COPY . /app
WORKDIR /app
+
+ RUN apk add --no-cache \
+     gcc \
+     libc-dev \
+     libxml2-dev \
+     libxslt-dev
RUN python setup.py install
CMD ["twitterscraper"]

CAUTION: It takes 2.5 minutes to build the image based on edited Dockerfile.

$ time docker build -t twitterscraper .
docker build -t twitterscraper .  0.09s user 0.06s system 0% cpu 2:31.43 total

I'll make a pull request to fix this issue.

I have also created the same Fix, but If your request can fix the issue Please Recreate the DockerFile in this manner. Because it will help to cache the apk add request and user need not to build the layer for installing the libxml2-dev dependencies each time he/she make the change in the source code https://github.com/taspinar/twitterscraper/pull/253/commits

 FROM python:3.7-alpine
 +
 + RUN apk add --no-cache \
 +     gcc \
 +     libc-dev \
 +     libxml2-dev \
 +     libxslt-dev
 COPY . /app
 WORKDIR /app
 RUN python setup.py install
 CMD ["twitterscraper"]