Closed sr320 closed 3 years ago
My notebook I've got some questions about the bioawk and FastQC errors, as well as more elegant ways to download raw data from google drive.
https://github.com/fish546-2021/Sam-Metabarcoding/blob/main/Analysis/01_Exploring_Data_Bat.ipynb
Here's mine. I also struggled with fastqc for hours and never got it to work via command line and found a lot of forums and tried all their solutions and still never would run the fastqc script. Bioawk also didn't want to work for me. Also have some questions on PHRED quality scores and associated probabilities
@jdduprey and @skreling - If you can post some of the error messages you were getting using FastQC
and/or bioawk
, I'd be happy to help get them sorted. A GitHub link to a Jupyter Notebook would be ideal, but copy/posted code and error messages will work, too.
https://github.com/fish546-2021/Laurel-genes/blob/main/0129_first_glance.ipynb
It was not allowing me to navigate to my data folder where I wanted to keep my data... I would appreciate some insight on that.
not allowing me to navigate to my data folder
This is a weird thing with Jupyter Notebooks which I still don't fully understand after years of using them (and dealing with this exact same thing). In order to make the notebook change to a different directory, you need to issue the cd
command without the leading !
. E.g.:
cd data
The painful details come down to how Jupyter Notebooks execute a command when you use the leading !
. What happens behind the scenes is that Jupyter Notebook opens a new Terminal "window" (which you don't see) and runs the command. After the command completes, then that hidden Terminal window is closed. So, when you start to run something the next cell, the commands from the previous cells don't really exist any more.
To help you understand, you can do this exact thing by opening a new Terminal window, running a cd
command, and then close the window. Then, open a new Terminal window and looks where you are - in the same place you started, not where you issued the cd
command in the other Terminal window.
My notebook for data exploration
and my fastqc results are here (though I do not know how to render that)
@skreling see https://youtu.be/5LYnq84Pjzk for details on FastQC
In your repo, generate a Jupyter notebook where you begin to explore your data. Be sure to show aspects including, size, number of reads, format, and any metadata. This should be some form of sequence data (please use Discussions to ask any questions you might have.) extra credit for anyone that runs fastqc on their data
Please make sure you have a good organizational structure for your repo first.
Drop a URL to the notebook below