wangna62691 commented 7 years ago

P_filter Minimum Subread Length: 50 P_filter Minimum Polymerase Read Quality: 0.75 N50 Polymerase Read Length after filtering: 2818 N50 Subread Length after filtering: 940 Mean Mapped Subread Concordance: 0.862 Mean Mapped Subread Coverage: 75.51

PacBio SMRTAnalysis Lab II record link to evaluation script https://github.com/wangna62691/PacBio_Assembly/blob/master/smartpipe_p4c4.sh

• record exact command used to submit job

qsub -q rcc-30d smartpipe_p4c4.sh

• record GitHub revision of script used https://github.com/wangna62691/PacBio_Assembly/blob/master/smartpipe_p4c4.sh

• report if the consensus.fasta sequence generated via the SMRT portal GUI is the same as the one you have generated on zcluster

Yes, they are exactly the same sequence.

cbergman commented 7 years ago

Concerning the SMRTportal analysis: you've anserwed the 1st,5th & 6th bullet points correctly. P_filter Minimum Polymerase Read Quality (0.75) can be obtained from the protocol definition on the SMRTportal or the Filtered Subreads.csv file. The N50 can be obtained from the Filtering and Subread Filtering reports:

• N50 Polymerase Read Length after filtering (2818 bp) • N50 Subread Length after filtering (940 bp)

cbergman commented 7 years ago

Concerning the SMRTanalysis of lambda on zcluster: everything looks good. One thing to note: when you update a previous comment, I don't get notified so I can't see you've submitted your work. In the future, when you post new information, please do it in a new comment.

wangna62691 commented 7 years ago

PacBio SMRTAnalysis Lab III

• record link to evaluation script https://github.com/wangna62691/PacBio_Assembly/blob/master/smartpipe_ecoli.sh • record exact command used to submit job

qsub -q rcc-30d smrtpipe_ecolo.sh

• record GitHub revision of script used https://github.com/wangna62691/PacBio_Assembly/blob/632fee9897bdc604096010159b6d8e30f9927286/smartpipe_ecoli.sh • report the number of variants detected between your Canu assembly and the polished version generated by the smrtpipe 3768.

There is a data folder in the Ecoli_out_new2 folder, variants number is reported in the file variants.bed. I count the variants number by

cat variants.bed |grep -c 'tig'

cbergman commented 7 years ago

You will need to change a couple things to get this to work: 1) change the following line from:

ls /escratch4/s_150/s_150_Mar_30/E01_1/Analysis_Results/*bas.h5 > $basedir1/Ecoli.fofn

to

ls /escratch4/s_150/s_150_Mar_30/E01_1/Analysis_Results/*bax.h5 > $basedir1/Ecoli.fofn

This is needed since the new E. coli data uses .bax.h5 files instead of bas.h5 files.

2) Line 26 needs to go before line 17, i.e. you need to format the reference sequence before you use it.

3) You need to make sure that you have the location of the reference in your settings.xml set properly. It should looks something like this:

        <param name="reference" hidden="true">
            <value>/home/student/binf8940/s_150/2nd_data/ecoli-auto/ECOLI_CANU</value>
        </param>

cbergman commented 7 years ago

Everything is correct and complete. One comment though: when updating information in a thread, it is better to make a new comment than to edit an old comment. This way you can see what problems you had and how you solved them. This is helpful when you are writing up your methods or when you encounter a similar problem in the future.

wangna62691 / PacBio_Assembly

PacBio SMRTportal #3

PacBio SMRTAnalysis Lab III