Open wangna62691 opened 7 years ago
Concerning the SMRTportal analysis: you've anserwed the 1st,5th & 6th bullet points correctly. P_filter Minimum Polymerase Read Quality (0.75) can be obtained from the protocol definition on the SMRTportal or the Filtered Subreads.csv file. The N50 can be obtained from the Filtering and Subread Filtering reports:
• N50 Polymerase Read Length after filtering (2818 bp) • N50 Subread Length after filtering (940 bp)
Concerning the SMRTanalysis of lambda on zcluster: everything looks good. One thing to note: when you update a previous comment, I don't get notified so I can't see you've submitted your work. In the future, when you post new information, please do it in a new comment.
• record link to evaluation script https://github.com/wangna62691/PacBio_Assembly/blob/master/smartpipe_ecoli.sh • record exact command used to submit job
qsub -q rcc-30d smrtpipe_ecolo.sh
• record GitHub revision of script used https://github.com/wangna62691/PacBio_Assembly/blob/632fee9897bdc604096010159b6d8e30f9927286/smartpipe_ecoli.sh • report the number of variants detected between your Canu assembly and the polished version generated by the smrtpipe 3768.
There is a data folder in the Ecoli_out_new2 folder, variants number is reported in the file variants.bed. I count the variants number by
cat variants.bed |grep -c 'tig'
You will need to change a couple things to get this to work: 1) change the following line from:
ls /escratch4/s_150/s_150_Mar_30/E01_1/Analysis_Results/*bas.h5 > $basedir1/Ecoli.fofn
to
ls /escratch4/s_150/s_150_Mar_30/E01_1/Analysis_Results/*bax.h5 > $basedir1/Ecoli.fofn
This is needed since the new E. coli data uses .bax.h5 files instead of bas.h5 files.
2) Line 26 needs to go before line 17, i.e. you need to format the reference sequence before you use it.
3) You need to make sure that you have the location of the reference in your settings.xml set properly. It should looks something like this:
<param name="reference" hidden="true">
<value>/home/student/binf8940/s_150/2nd_data/ecoli-auto/ECOLI_CANU</value>
</param>
Everything is correct and complete. One comment though: when updating information in a thread, it is better to make a new comment than to edit an old comment. This way you can see what problems you had and how you solved them. This is helpful when you are writing up your methods or when you encounter a similar problem in the future.
P_filter Minimum Subread Length: 50 P_filter Minimum Polymerase Read Quality: 0.75 N50 Polymerase Read Length after filtering: 2818 N50 Subread Length after filtering: 940 Mean Mapped Subread Concordance: 0.862 Mean Mapped Subread Coverage: 75.51
PacBio SMRTAnalysis Lab II record link to evaluation script https://github.com/wangna62691/PacBio_Assembly/blob/master/smartpipe_p4c4.sh
• record exact command used to submit job
• record GitHub revision of script used https://github.com/wangna62691/PacBio_Assembly/blob/master/smartpipe_p4c4.sh
• report if the consensus.fasta sequence generated via the SMRT portal GUI is the same as the one you have generated on zcluster
Yes, they are exactly the same sequence.