statgen / pheweb

A tool to build a website to browse hundreds or thousands of GWAS.
MIT License
158 stars 65 forks source link

Public the website #132

Closed stat-yyang closed 4 years ago

stat-yyang commented 4 years ago

Hi,

Thanks a lot for your contributions to the Pheweb project. It's amazing to see the results displayed on a website. I am quite new to web development. I am exploring to deploy the website built locally to the public. Now I have followed all the steps in your instructions and served the website in a cluster and I can access it using my own computer. However, I would like to make it public and allow everyone to access it.

Do you know any methods to achieve that? Please just briefly give me some guidance and I can explore furthermore.

Thanks!

pjvandehaar commented 4 years ago

That's great, I'm glad it's working well for you. Yeah, it's fun to see the data visually.

How do you access the site on your computer? Can you reach it from anywhere, using an IP address? Or only at your office? Or did you have to do SSH tunneling?

If you don't have one, you need a computer in your cluster that accepts connections on port 80 from the open internet. If that's not possible, AWS or DigitalOcean work great.

You should set up Apache2 (or Nginx if you prefer) on that server, and you should run pheweb using systemd. There's more info at https://github.com/statgen/pheweb#6-serve-the-website . If you haven't used Apache2 before, this guide might help.

I'd love to see your site once it's public.

stat-yyang commented 4 years ago

Thanks for your advice! We were going to serve it using the School server but was told that it only accepted internal access. Do you have any advice on which cloud computing service to buy?

Basically, do we need a computer running all the time or could we publish all of the generated files to a host?

pjvandehaar commented 4 years ago

If you prevent scrolling/zooming the Locuszoom region plots and limit the autocomplete search box to only suggest gene names and phenotypes (and no longer suggest rsids or variants), the site could be converted to static files using wget or a dedicated archiving tool. If you want to attempt it I can show you which lines of code to remove to implement those changes. It would take a lot more storage space— perhaps 100x what it does now due to storing the same html over and over and storing overlapping Locuszoom plots. The storage issue could be mitigated with some shared JavaScript that wrote the DOM and a clever solution to the overlapping-plots issue.

I recommend a webserver. It should run fine on the $5/month plan from DigitalOcean, plus storage cost. The Readme has instructions on reducing storage use.

stat-yyang commented 4 years ago

Thank you so much! This is exactly what I am looking for (using static files and deploy them to a web server instead of renting a cloud server with computing resources).

Could you please show me how to implement the changes? And after that, is it clear how to deploy it into a web server (like DigitalOcean)? I would be very appreciative if there is some post for similar tasks available online for instructions.

Thanks again, Peter!

pjvandehaar commented 4 years ago

Do you have a server that serves static files but won't run python code?

If you rent a DigitalOcean server, it will run the normal python pheweb code just fine. I think you will have a much better experience maintaining the server and it will work better for your users if you just use the normal python code instead of trying to make static files.

If you really do need a static version of the site, I recommend that you begin by figuring out how to use wget (or frozen-flask or some other archiving tool) to make the static copy of your site. After downloading all the html/css/js, you'll be able to see json files that were missed in your browser devtools network tab and can write a script to archive them all. Once that works, I can help with the searchbox and LocusZoom region-plot movement.

stat-yyang commented 4 years ago

Thanks! I misunderstood it previously. Now we decided to use DigitalOcean from your advice. But an unexpected problem comes out: in the pre-installed python 3.6.9, the command python3 -m install pheweb failed because of the failure of installing pysam.

(website) root@pythondjango-quickstart-ubuntu-s-1vcpu-1gb-nyc1-01:~ python -m pip install pheweb Collecting pheweb Using cached https://files.pythonhosted.org/packages/6e/ee/47011c36908bd53798458ceca3018515e3483ed92b8a608f16ff3783216d/PheWeb-1.1.18.tar.gz Collecting Flask-Compress~=1.4 (from pheweb) Using cached https://files.pythonhosted.org/packages/a0/96/cd684c1ffe97b513303b5bfd4bbfb4114c5f4a5ea8a737af6fd813273df8/Flask-Compress-1.5.0.tar.gz Collecting Flask-Login~=0.4 (from pheweb) Using cached https://files.pythonhosted.org/packages/2b/83/ac5bf3279f969704fc1e63f050c50e10985e50fd340e6069ec7e09df5442/Flask_Login-0.5.0-py2.py3-none-any.whl Collecting Flask~=1.0 (from pheweb) Using cached https://files.pythonhosted.org/packages/f2/28/2a03252dfb9ebf377f40fba6a7841b47083260bf8bd8e737b0c6952df83f/Flask-1.1.2-py2.py3-none-any.whl Collecting blist~=1.3 (from pheweb) Using cached https://files.pythonhosted.org/packages/6b/a8/dca5224abe81ccf8db81f8a2ca3d63e7a5fa7a86adc198d4e268c67ce884/blist-1.3.6.tar.gz Collecting boltons~=19.1 (from pheweb) Using cached https://files.pythonhosted.org/packages/62/b2/2893b608ff69fea56d3c4993bfe88bcfdd4c33d32f6a476eed09ca9d9191/boltons-19.3.0-py2.py3-none-any.whl Collecting cffi~=1.12 (from pheweb) Using cached https://files.pythonhosted.org/packages/f1/c7/72abda280893609e1ddfff90f8064568bd8bcb2c1770a9d5bb5edb2d1fea/cffi-1.14.0-cp36-cp36m-manylinux1_x86_64.whl Collecting gevent~=1.4 (from pheweb) Using cached https://files.pythonhosted.org/packages/5a/79/2c63d385d017b5dd7d70983a463dfd25befae70c824fedb857df6e72eff2/gevent-1.5.0.tar.gz Collecting gunicorn~=19.9 (from pheweb) Using cached https://files.pythonhosted.org/packages/5f/54/c15f2c243c19074cbf06ce6c48732d99aec825487f87e57e86e9a22990f2/gunicorn-19.10.0-py2.py3-none-any.whl Collecting intervaltree~=3.0 (from pheweb) Using cached https://files.pythonhosted.org/packages/e8/f9/76237755b2020cd74549e98667210b2dd54d3fb17c6f4a62631e61d31225/intervaltree-3.0.2.tar.gz Collecting marisa-trie~=0.7 (from pheweb) Using cached https://files.pythonhosted.org/packages/20/95/d23071d0992dabcb61c948fb118a90683193befc88c23e745b050a29e7db/marisa-trie-0.7.5.tar.gz Collecting numpy~=1.16 (from pheweb) Using cached https://files.pythonhosted.org/packages/03/27/e35e7c6e6a52fab9fcc64fc2b20c6b516eba930bb02b10ace3b38200d3ab/numpy-1.18.4-cp36-cp36m-manylinux1_x86_64.whl Collecting openpyxl~=2.6 (from pheweb) Using cached https://files.pythonhosted.org/packages/d6/26/eb28e975b7a37aad38d7ec4f7a0f652bdee6ecf36e6bd06f473c5af9b87b/openpyxl-2.6.4.tar.gz Collecting pysam~=0.15.2 (from pheweb) Using cached https://files.pythonhosted.org/packages/25/7e/098753acbdac54ace0c6dc1f8a74b54c8028ab73fb027f6a4215487d1fea/pysam-0.15.4.tar.gz Complete output from command python setup.py egg_info:

pysam: cython is available - using cythonize if necessary

# pysam: htslib mode is shared
# pysam: HTSLIB_CONFIGURE_OPTIONS=None
# pysam: (sysconfig) CC=x86_64-linux-gnu-gcc -pthread
# pysam: (sysconfig) CFLAGS=-Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g   -fstack-protector-strong -Wformat -Werror=format-security  -g -flto -fuse-linker-plugin -ffat-lto-objects
# pysam: (sysconfig) LDFLAGS=-Wl,-Bsymbolic-functions  -Wl,-z,relro
checking for gcc... x86_64-linux-gnu-gcc -pthread
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether x86_64-linux-gnu-gcc -pthread accepts -g... yes
checking for x86_64-linux-gnu-gcc -pthread option to accept ISO C89... none needed
checking for ranlib... ranlib
checking for grep that handles long lines and -e... /bin/grep
checking for C compiler warning flags... unknown
checking for special C compiler options needed for large files... no
checking for _FILE_OFFSET_BITS value needed for large files... no
checking for _LARGEFILE_SOURCE value needed for large files... no
checking shared library type for unknown-Linux... plain .so
checking how to run the C preprocessor... x86_64-linux-gnu-gcc -pthread -E
checking for egrep... /bin/grep -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking for stdlib.h... (cached) yes
checking for unistd.h... (cached) yes
checking for sys/param.h... yes
checking for getpagesize... yes
checking for working mmap... yes
checking for gmtime_r... yes
checking for fsync... yes
checking for drand48... yes
checking whether fdatasync is declared... yes
checking for fdatasync... yes
checking for library containing log... -lm
checking for zlib.h... yes
checking for inflate in -lz... yes
checking for library containing recv... none required
checking for bzlib.h... no
checking for BZ2_bzBuffToBuffCompress in -lbz2... no
configure: error: libbzip2 development files not found

The CRAM format may use bzip2 compression, which is implemented in HTSlib
by using compression routines from libbzip2 <http://www.bzip.org/>.

Building HTSlib requires libbzip2 development files to be installed on the
build machine; you may need to ensure a package such as libbz2-dev (on Debian
or Ubuntu Linux) or bzip2-devel (on RPM-based Linux distributions or Cygwin)
is installed.

Either configure with --disable-bz2 (which will make some CRAM files
produced elsewhere unreadable) or resolve this error to build HTSlib.
checking for gcc... x86_64-linux-gnu-gcc -pthread
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether x86_64-linux-gnu-gcc -pthread accepts -g... yes
checking for x86_64-linux-gnu-gcc -pthread option to accept ISO C89... none needed
checking for ranlib... ranlib
checking for grep that handles long lines and -e... /bin/grep
checking for C compiler warning flags... unknown
checking for special C compiler options needed for large files... no
checking for _FILE_OFFSET_BITS value needed for large files... no
checking for _LARGEFILE_SOURCE value needed for large files... no
checking shared library type for unknown-Linux... plain .so
checking how to run the C preprocessor... x86_64-linux-gnu-gcc -pthread -E
checking for egrep... /bin/grep -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking for stdlib.h... (cached) yes
checking for unistd.h... (cached) yes
checking for sys/param.h... yes
checking for getpagesize... yes
checking for working mmap... yes
checking for gmtime_r... yes
checking for fsync... yes
checking for drand48... yes
checking whether fdatasync is declared... yes
checking for fdatasync... yes
checking for library containing log... -lm
checking for zlib.h... yes
checking for inflate in -lz... yes
checking for library containing recv... none required
checking for bzlib.h... no
checking for BZ2_bzBuffToBuffCompress in -lbz2... no
configure: error: libbzip2 development files not found

The CRAM format may use bzip2 compression, which is implemented in HTSlib
by using compression routines from libbzip2 <http://www.bzip.org/>.

Building HTSlib requires libbzip2 development files to be installed on the
build machine; you may need to ensure a package such as libbz2-dev (on Debian
or Ubuntu Linux) or bzip2-devel (on RPM-based Linux distributions or Cygwin)
is installed.

Either configure with --disable-bz2 (which will make some CRAM files
produced elsewhere unreadable) or resolve this error to build HTSlib.
make: ./version.sh: Command not found
make: ./version.sh: Command not found
config.mk:2: *** Resolve configure error first.  Stop.
# pysam: htslib configure options: None
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/tmp/pip-build-o2aqd4q6/pysam/setup.py", line 241, in <module>
    htslib_make_options = run_make_print_config()
  File "/tmp/pip-build-o2aqd4q6/pysam/setup.py", line 68, in run_make_print_config
    stdout = subprocess.check_output(["make", "-s", "print-config"])
  File "/usr/lib/python3.6/subprocess.py", line 356, in check_output
    **kwargs).stdout
  File "/usr/lib/python3.6/subprocess.py", line 438, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['make', '-s', 'print-config']' returned non-zero exit status 2.

----------------------------------------

Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-o2aqd4q6/pysam/

I followed your advice here (https://github.com/statgen/pheweb/blob/3530416b3e7975593144b70d7508db3774d783cb/etc/detailed-install-instructions.md#detailed-install-instructions). However, I still cannot solve the problems of installing pheweb. Do you have any suggestions?

The server is Python/Django Quickstart 1.1 on Ubuntu 18.04. Should I change to another default setting in DigitalOcean or do you have any successful settings to recommend?

stat-yyang commented 4 years ago

Uh I solved the installation problem! I simply installed all the packages manually and then installed pheweb successfully.

stat-yyang commented 4 years ago

Hi Peter, I moved all the necessary files to DigitalOcean server (I did not rerun the codes in the server). But encountered another error when serving the website.

======= Exception ==== [Errno 2] No such file or directory: '/root/.pheweb/cache/gene_aliases.marisa_trie'

======= Traceback ==== Traceback (most recent call last): File "/root/website/lib/python3.6/site-packages/pheweb/command_line.py", line 148, in main run(sys.argv[1:]) File "/root/website/lib/python3.6/site-packages/pheweb/command_line.py", line 142, in run handlerssubcommand File "/root/website/lib/python3.6/site-packages/pheweb/command_line.py", line 70, in serve run(argv) File "/root/website/lib/python3.6/site-packages/pheweb/serve/run.py", line 132, in run from .server import app File "/root/website/lib/python3.6/site-packages/pheweb/serve/server.py", line 66, in autocompleter = Autocompleter(phenos) File "/root/website/lib/python3.6/site-packages/pheweb/serve/autocomplete.py", line 29, in init self._gene_alias_trie = marisa_trie.BytesTrie().load(common_filepaths['gene-aliases-trie']) File "src/marisa_trie.pyx", line 217, in marisa_trie._Trie.load FileNotFoundError: [Errno 2] No such file or directory: '/root/pheweb/generated-by-pheweb/sites/genes/gene_aliases.marisa_trie'

Do you have any idea what happened or do I need to rerun the codes in DigitalOcean Server? I wish I don't need to rerun the pheweb process in the server, because it may take several days to finish.

Thanks!

stat-yyang commented 4 years ago

Uh, I corrected the error above. Now our website is launched (http://67.205.180.40:443/). Thanks a lot for your help!

Best, Yue