szabgab / wis-python-course-2024-04

16 stars 9 forks source link

WIS Python programming course started in 2024.04

course

Students

Home page Repo Assignments Project Status
Adam Liberman repo solutions statistical analysis program done
Adi Bar-El Meisel repo solutions Gene Amplification and SNP Analysis using VCF Files done
Ana Mejia Fleisacher repo solutions Integrated Metagenomics Analysis done
Avital Rosner repo solutions Fat tissue analysis for cfRNA data done
Boaz Yaari repo solutions done
Boyue Sun repo solutions Interactive Analysis of Mitochondrial Dispersal in Cells done
Chen Davidyan Krisi repo solutions Micropattern on a membrane done
Daniella Dayagi repo
Ekaterina Zhigileva repo solutions Concentration-Time Graphs for multi-inputs CSTR done
Elad Wizman repo
Hadar Klimovski repo solutions ProMALS : Advanced Proteomic Profiling for Precision Biomarkers in ALS Prognosis and Disease Progression done
Hernan Rubinstein repo solutions Characterizing Signaling Dependent Programs During Embryonic Cell-fate Decisions in vivo done
Liron Hoffman repo solutions qPCR Relative Expression Analysis done
Maher Salhab repo solutions Unbiased identification of axonal localization motifs of FABP7 mRNAs through a pooled screen approach. done
Mazal Faraj repo solutions Cefprozil's Effect on Gut Microbiota done
Meir Sylman repo solutions From Electropherogram to peptide done
Noy Ravensary repo solutions Protein Quiz Game done
Omer Sapir repo solutions Vertical Profile Analysis Tool For Atmospheric Properties Measured By UAV "Stairs" Method done
Omer Zachar repo solutions
Orlena Benamozig repo
Peleg Schneider repo
Rebecca Bornstein repo solutions
Roi Siegelman repo solutions Elucidating the role of NSD1 in breast cancer progression done
Sameeha Mittwali repo
Shahar Garin repo solutions smFISH mRNA-organelle Statistical Analysis Tool done
Shaked Levy repo solutions Hamiltonian Systems Analysis Tool done
Thay Karmin repo solutions Carbon isotope Picarro result analyzer done
Thea Meimoun repo solutions Sequence Identity Analysis Algorithm for Protein Interactions in ALS Research, focusing on Human TDP43 and Its Interactors done
Yael Arieli repo solutions Calculate adiabatic fraction in a cloud done
Yuval Bernard repo solutions Fitting Z-spectra via Nonlinear Least Squares done

Plan

Schedule

Participation in the lectures

There is no requirement to participate in the lectures. You will be able to watch the videos later. However, it is recommended to participate as that gives you an opportunity to ask questions.

Timestamps

Each video will be around 1 hour long. In order to make it easier to access the specific topics I would like to add timestamps to each video. A timestamp looks like this:

00:00 Start
01:30 Installation

Meaning at 1 minute 30 seconds I started to talk about Installation.

I'll need volunteers to prepare these timestamps for each video on the day after the lecture. You basically need to watch the video and write down all the points where you think you or someone else would like to jump to. You can see such timestamps in the comments of many YouTube videos. We will have an issue where you'll be able to volunteer.

Assignments

There will be assignments after every lecture. You will submit them via GitHub. I'll explain the details during the lectures.

Project

Towards the end of the course you'll be asked to do a project. First you need to submit a proposal for the project and when it is accepted then implement it. The project should be something that is useful for your studies or at least it is fun for you to make. Ask in the lab where you work what needs are there that you might implement as your final project. You can get inspiration from the projects listed here and the projects of the 2023 autumn semester.

Grades

Slides

During the course I'll use some of the slides that can be found here. These slides are publicly available and will remain on the web site after the course is over.

Videos in English

There are recording of this course from 3 years ago.

There are also recordings from the 2023 autumn semester.

You can watch those, but be also warned, this semester the order of the material will be different.

There are many more videos in my English-language YouTube channel. You are invited to check them out and to follow the channel.

Videos in Hebrew

Some of the material is also available in Hebrew. You can find them on my website and in my Hebrew-language YouTube channel. You are invited to follow that channel as well.

Language

The standard language of WIS and of this course is English.

However, when on one-on-one conversions I'd be happy to speak in Hebrew, Hungarian, Spanish, or Ladino.

Installations

There is no need to install anything up front. We'll do that during the lectures.

Day 1

Videos

Notes

git config --global user.email your@mail.address.com
git config --global user.name "Your Name"

Git commands:

git clone ...
git status
git add
git commit -m "some explanation"
git push
ssh-keygen                 (just press ENTER several times accepting all the defaults)
cat ~/.ssh/....pub
pwd               print working directory
cd  /c/Users      change directory

In the File Explorer click on "view" and then mark the check-boxes: "File name extensions" and "Hidden items"

Assignment (day 1)

Day 2

Videos

git diff
git mv

Assignment (day 2)

Day 3

Assignment (day 3)

Day 4

Factorial

n!  f(n) = n*(n-1)*(n-2) .... 1

f(1) = 1
f(n) = n * f(n-1)

Fibonacci

 1, 1, 2, 3, 5, 8, 13, ...
f(1) = 1
f(2) = 1
f(n) = f(n-1) + f(n-2)

Do not comment unnecessarily:

exit()  # end the program

An example when comments go bad:

counter = 0

counter += 2 # add 1 to the counter

Comment where things are unclear:

score = (round(100 * (1 - ((n - 1) / 19)) * (1 / (1 + (0.004 * t)))))

Explain "why" not the "what"

Assignment (day 4)

Create a folder called day04 and write a program that given a filename on the command line will print the following:

python count.py FILENAME

Day 5

Take the Fibonacci function and move to a module. Write tests.

area function with a bug

pytest --doctest-modules mymath.py

Assignment (day 5)

Create a folder called day05 copy your solution from day 4 (character, word, and line counting)

Day 6

Assignment (day 6)

Create a folder called day6 and in it:

Find an important Excel or CSV file in the lab where you work and write a program that will do some computation on it. Include a sample of the data file and write a test that will verify the results given that input file.

If you don't work in any lab, ask the other participants of the course for a file. Worst case you can download a file from Kaggle.

If the files you have contain data that you don't want to be public, replace the real data with fake values.

I know that "some computation" is rather vague, but that's because in each case something els might make sense.

Calculation might be summing up certain numbers or maybe doing a slightly more complex calculation on the data. Maybe collecing and listing some values that might be numerical or textual.

Please also add a README.md to the day6 folder explaining in a few words or few sentences what the data is and what kind of computation you do.

Day 7

Create virtual environment:

virtualenv -p python3 venv

Start virtual envrionment:

source venv/bin/activate

On Windows it is probably started by running .\venv\bin\activate.bat

pip install pytest
pip install -r requirements.txt
$ pip freeze
et-xmlfile==1.1.0
iniconfig==2.0.0
openpyxl==3.1.3
packaging==24.0
pluggy==1.5.0
pytest==8.2.1

Assignment (day 7)

Create day07 folder

Write a command line tool that can download data from NCBI. You can download from the nucleotide database as we did in the lecture, but it would be much more interesting if you used some of the other databases available on NCBI. e.g.:

python ncbi.py  TERM NUMBER

Search for the TERM and download up to NUMBER items.Save each item in its own file. Print the names of the files. Also save the date, the search term, the number asked for and the total number of items found in a csv file. So if you run it twice

python ncbi.py  Orchid 3
python ncbi.py  cauliflower 7

you'd get something like this:

date,term,max,total
2024-05-30 17:20:21,Orchid,3,527341
2024-05-30 18:12:34,cauliflower,7,32781

Add a README.md that explain what the program does.

Project proposal

As I explained in the lecture you will have to write a project that is hopefully going to be useful or fun for you.

Size: about 4 times bigger than a weekly assignment.

Create a new git repository and in the README.md file describe the project. Add links to explanations and images if necessary. This is the project proposal and this will also become the user-documentation.

The proposal should include the scientific background of the project and the technical, programming part.

The scientific part will likely include many terms I am not familiar with. Please include links to explanations. This part should also include the explain about the models and how the data is processed. If you use well-known algorithms then please link to explanations.

The technical part is mostly "standard", but it is better to be explicit about it. So:

If your project relies on some data (most likely it will), then include some data files in the repository. The actual values in the file can be fake, but the format of the file should be the same as the file(s) you already have.

Please give the project a real title that is meaningful to other researchers and not just "project", after all these projects are expected to have value way beyond being projects for this course. Name the repository accordingly.

It would be nice if the README had a reference to the course e.g. a link back to our repo something like this,

This project was originally implemented as part of the Python programming course at the Weizmann Institute of Science taught by Gabor Szabo

but beyond that it is just a stand-alone project.

Open an issue on our repository with a link to this repo.

Day 8

pip install jupyter
jupyter lab
jupyter notebook

Assignment (day 8)

One of the following two in the day08 folder:

  1. Take the number guessing game and create GUI for it using Tk.

    • Have a button or a menu option to exit the game.
    • Have a button to restart the game - the computer generates a new random number and resets the guess-counter.
    • Have a button that will show the currently hidden value in a pop-out dialog.
  2. Create a GUI for the ncbi.py of the previous assignment.

    • Have a box to type in the search term.
    • Another box to type in the number - how many to download.
    • A selector to select the database. (nucleotide, ...?)
    • A selector to select the file-format (GeneBank, FASTA, what else can be there?).
    • A button to "download the data".

I'd strongly recommend that before you submit the solution to me, show it to one of the other students and let that person try it.

Day 9

ncbi_original.py
ncbi_shared.py
ncbi_download_folder.py
ncbi_argparse.py
check_number_with_regex.py
check_double_letters_with_regex.py
dna_with_regex.py

Assignment (day 9)

We saw one example analyzing sequences: find the longest sub-sequence that repeates itself. In the day9 folder create a program that will receive the path to a file in Fasta or GeneBank format, and use the above analyzis to print out the longest sub-sequence that appears twice.

Then come up with some other "interesting feature" of sequences and add that analyzis too. Make both analyzis optional and let the user control which one is done: (assuming this second analyzis is called blabla you could use the program like this:

python analyze.py FILE --duplicate --blabla

What is an "interesting feature" is up to you. It can be a real, scientifically valuable feature, but if that's too difficult it can be some simpe feature like the repetition we have.

Include a README.md file and the requirements.txt file if necessary.

Day 10

Assignment (day 10)

Project