tmozgach / ent_ob

Entrepreneur’s online behavior
1 stars 0 forks source link

Cedar SFU supercomputer #8

Open tmozgach opened 6 years ago

tmozgach commented 6 years ago

Cedar is a heterogeneous cluster suitable for a variety of workloads; Registration link: http://www.rcg.sfu.ca/hpc/cedar/register-cc. Registration Tutorial: https://www.youtube.com/watch?v=EtFUvBYaaZE&list=PLlIsA3IgWjnweFiHlOjBmsF2NTeJ6f5rT

Log in on Mac or Linux:

ssh tmozgach@cedar.computecanada.ca

How to deal with software:

module load python/3.6.3

https://docs.computecanada.ca/wiki/Available_software

Transfer files from laptop to Cedar sever:

scp TPostRawShort.csv tmozgach@cedar.computecanada.ca:/home/tmozgach/scratch/TM

Transfer files from Cedar to laptop

scp -r TM tmozgach@cedar.computecanada.ca:/home/tmozgach/scratch /home/tatyana/nlp
scp tmozgach@cedar.computecanada.ca:/home/tmozgach/scratch/TM/TM_lda25.model* .

OR

Go to http://globus.computecanada.ca. Your "existing organizational login" is your CCDB account. Ensure that "Compute Canada" is selected in the drop-down, then click Continue. Supply your CCDB username and password on the Compute Canada MyProxy page which appears. This takes you to the web portal for Globus. 

How to set up Globus: https://docs.computecanada.ca/wiki/Globus

Full webinars video: https://docs.computecanada.ca/wiki/Getting_Started_with_the_new_National_Systems

Install Python module inside Cedar: https://docs.computecanada.ca/wiki/Python

Main set up: Once:

mkdir ~/virtualenvironment
virtualenv ~/virtualenvironment
source ~/virtualenvironment/bin/activate
(virtualenvironment) [tmozgach@cedar5 TM]$ pip install nltk gensim pandas pyLDAvis

Every time when you log in:

source ~/virtualenvironment/bin/activate

Install Python module:

python2 -m pip install --user NAME_OF_MODULE

You MUST install the following python packages in Cedar in YOUR virtualenvironment:

nltk
gensim 
pyLDAvis
textblob
pandas
tmozgach commented 6 years ago

TO-DO: try parallelized LDA model https://radimrehurek.com/gensim/models/ldamulticore.html#module-gensim.models.ldamulticore

tmozgach commented 6 years ago

Running on Cedar (First trial, in /home/tmozgach/scratch/TM): TM_job.sh

#!/bin/bash
#SBATCH --time=04:00:00
#SBATCH --account=def-emodata
#SBATCH --mem=2000
python ./TopicModeling.py
echo 'I finished'

by command

sbatch TM_job.sh

for TPostRaw.csv (only title and post, NO comments)