jakubgrad / Tietokannat-ja-web-ohjelmointi-Projekti

A repository for the course Tietokannat ja web-ohjelmointi taken in the 4th period of academic year 2023-2024 at University of Helsinki
0 stars 0 forks source link

Tietokannat-ja-web-ohjelmointi-Projekti

A repository for the course project in Tietokannat ja web-ohjelmointi taken in the 4th period of academic year 2023-2024 at University of Helsinki by a student of Bachelor's Programme in Science.

The goal of the project is to create an online application that supports billingual reading in Python using a Postgres database. Billingual reading is a practice of reading the same book in 2 languages. The app would allow users to log in and upload pairs of machine-readable pdfs of the same book in two different languages. The website would process the pdfs, perform sentence tokenization using spacy or similar library, and then upload them to the server. The users would then be able to choose a pair of books that they uploaded and read them side by side. The goal is to make the UI user friendly, so that there is e.g. the possiblity pf aligning the text of the two books and leaving bookmarks.

Description of the image

Technical details: the project has many dependencies, including Python Flask, Flask SqlAlchemy, Psycopg2 and so on. The full list is in the requirements.txt document and the installation is for them is outlined below. The project use a virtual environment.
Tested using https://vdi.helsinki.fi/, university's online virtual machine running on Cubbli.

The database has the following tables:

-a table with users (username and password)
-a table for uploaded pdfs that contains names, author, language, ISBN
-a table for pairs of pdfs to read
-a table for named bookmarks for the convenience of the reader

Installation

Created for University of Helsinki's Cubbli OS:
Install postgresql if you haven't already. You can follow the instructions here or do the following at your university computer:

cd ~
touch .bashrc #possibly the file existed already
git clone https://github.com/hy-tsoha/local-pg.git
bash local-pg/pg-install.sh install .bashrc

To start postgresql database, run:

source .bashrc
start-pg.sh  

The database is necessary for the application to work, but once you're finished using the application, remember to close it with Ctrl + c. Now that your database is running, in a separate terminal run:

psql -h ~/pgsql/sock/ 

If your prompt has changed to username=#, that means that psql works correctly. You can exit it by pressing Ctrl + d. Now let's download this repository. The location is actually important here, since if we install in a symlinked directory like Desktop, Downloads, or Documents, we might not be able to start a Python virtual environment later on. For me, installing in my user's home directory works the best. So you can simply run:

cd ~
git clone https://github.com/jakubgrad/Tietokannat-ja-web-ohjelmointi-Projekti.git

Now you want to populate the database with tables and some sample data:

psql -h ~/pgsql/sock/ < Tietokannat-ja-web-ohjelmointi-Projekti/schema.sql 

Go to the repository, install and enter the virtual environment:

cd Tietokannat-ja-web-ohjelmointi-Projekti/
python3 -m venv venv #It can take 20 seconds. Also, if you get Errno 95, you need to download the repository somewhere else and run this line inside it
source venv/bin/activate

Now your prompt should be preceeded with (venv) and you can install the dependencies for the application:

pip install -r requirements.txt      

Because my project using a pdf parser etc. it can also take a bit to intsall the requierements. After that, declare environment variables:

echo -e "DATABASE_URL=postgresql+psycopg2://\nSECRET_KEY="$(python3 -c "import secrets; print(secrets.token_hex(16))") > .env

Go to src and run the program:

cd src/
flask run

You should see:

 * Debug mode: off
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on http://127.0.0.1:5000
Press CTRL+C to quit

And you can now head over to localhost:5000 in your browser and see the program in action.
Because of the nature of pdf parsing, the application will reject a lot of pdfs or else, format them in strange ways. Because of it, the repository contains two pdfs that are ready to use and tested, residing in the /examples directory. So when using the application for the first time, I recommend to:

Useful commands

See linting report:

pylint src/*.py

Troubleshooting

I found that frequently launching and closing university VMs confused web browsers. For a quick fix when none of your browsers want to open, run:

rm -rf ~/.config/chromium/Singleton*
rm -rf ~/.config/google-chrome/Singleton*

It frees up both chrome and chromium.

To do

Peer review 1 Peer review 2