Closed sajvanderzeeuw closed 8 years ago
Hi Sander,
What version of spades are you using? The missing file error was a known issue - please see https://github.com/sanger-pathogens/circlator/issues/39 If you want to use Circlator version 0.14.0, then you'll need SPAdes version 3.6.0 or lower. For the paper we used 3.5.0.
Re the plasmodium data, we used the option --assemble_spades_k 101 becuase the reads were low coverage.
As for options, that probably depends on your dataset. We explored varying most of the options in the paper (see the section "Evaluation of user-defined parameters").
Best, Martin
Hi @martinghunt,
Thanks for replying so fast. I've checked your test data from Plasmodium and managed to reproduce even with the newest version!
I have one other question if i may ask. What is in your terms low coverage? 10 till 50 X?
Thanks, Best,
Sander
Hi Sander,
Thanks for confirming that you reproduced the results. That's always good to hear :)
I just had a quick look at a few of the NCTC samples compared with the plasmodium. I just made rough histograms of the read coverage from running samtools depth -aa. Typical NCTC example is:
Bear in mind this is corrected read coverage (ie what Circlator takes), not uncorrected reads.
Martin
Thanks Martin!
Sorry but i think i made a mistake in my comparison, i thought the 2 outputs of my run vs you test data was the same but i missed some. I run it with the --assemble_spades_k 101 .
Circlator all hgap_assembly.fasta corrected_reads.fasta circlator_k_101
06.fixstart.log from testDATA:
[contig break finder] id break_point gene_name gene_reversed new_name skipped
[contig break finder] unitig_96|quiver 16443 prodigal no - -
[contig break finder] unitig_97|quiver 2906 prodigal yes - -
06.fixstart.log from My run:
[contig break finder] id break_point gene_name gene_reversed new_name skipped
[contig break finder] unitig_96|quiver 16443 prodigal no - -
[contig break finder] unitig_97|quiver 2906 prodigal no - -
I could only find one difference in the other logs and that was the clean.log
05.clean.log YOURS
[clean] unitig_96|quiver user_kept
[clean] unitig_97|quiver user_kept
05.clean.log MINE
[contig cleanup] contig_id clean_log
[contig cleanup] unitig_96|quiver skipped
[contig cleanup] unitig_97|quiver skipped
What can be the cause of this? Thanks in advance!
SPAdes version : 3.5.0 Mummer: 3.23 Circlator: 0.14.0 Prodigal: 2.60 BWA: 0.7.12 Samtools: 0.1.9
Apologies for the confusion - in actual fact Circlator 0.14.1 was used. I will need to send an erratum to get the manuscript corrected.
Using 0.14.1 will mean that the 05.clean.logs will be identical.
For the 06.fixstart.log, this part of circlator uses code from here: https://github.com/sanger-pathogens/bio_assembly_refinement In the paper we used 0.3.2. You've probably got 0.5.0 installed.
No problem, i installed everything according to your story, but i still do not manage to reproduce the results.
These are the packages i now have installed.
pip freeze
bio-assembly-refinement==0.3.2
circlator==0.14.1
Cython==0.24
et-xmlfile==1.0.1
jdcal==1.2
nose==1.3.7
numpy==1.11.0
openpyxl==2.4.0a1
pbr==1.9.1
pyfastaq==3.5.0
pymummer==0.4.0
pysam==0.8.1
six==1.10.0
stevedore==1.12.0
virtualenv==15.0.1
virtualenv-clone==0.2.6
virtualenvwrapper==4.7.1
Here you can download the Plasmodium data i use, just to be sure that i have the same as what you distribute.
https://barmsijs.lumc.nl/szeeuw/Plasmodium.tar.gz
In here all runs i tried are stored.
The circlator
dir is your run
circlator_0.14.1.fixed
is the latest run i did.
Yours 06.fixstart
[contig break finder] id break_point gene_name gene_reversed new_name skipped
[contig break finder] unitig_96|quiver 16443 prodigal no - -
[contig break finder] unitig_97|quiver 2906 prodigal yes - -
My 06.fixstart
[contig break finder] id break_point gene_name gene_reversed new_name skipped
[contig break finder] unitig_96|quiver 16443 prodigal no - -
[contig break finder] unitig_97|quiver 16227 prodigal no - -
Again any help will be strongly appreciated.
That pins down the remaining difference to fixstart, which is essentially bio_assembly_improvement. The contigs obviously have no match to a dnaa gene, so prodigal gets run to pick a gene near the centre of each contig and use that as a break point.
I'm wondering if that part of the code is deterministic. @nds can you comment please?
@martinghunt Which version of bio_assembly_improvement i should have installed, to get this to work? Or is this related to the version of Prodigal. Hope you can still provide me an answer on this. Thanks in advance!
Hi,
Sorry for the slow reply, took some digging to figure out what's going on. It's actually earlier on at the merge stage where things are different, not bio_assembly_refinement. Depending on code versions, unitig_97 doesn't always get circularized succesfully. It should be ~6kb long and then the prodigal gene is at 2906. When the circularization goes wrong, it ends up around 30kb long and then the prodigal gene is at 16227.
I just ran circlator 0.14.1 with spades 3.5.0 on my original data, and on the data in your tarball and I get the same results as the paper. The important file here is the circularize log:
$ cat 04.merge.circularise.log [merge circularised] #Contig repetitive_deleted circl_using_nucmer circl_using_spades circularised [merge circularised] unitig_96|quiver 0 1 0 1 [merge circularised] unitig_97|quiver 0 0 1 1
In a failed version, unitig_97 doesn't get circularized using the circular call from spades: [merge circularised] #Contig repetitive_deleted circl_using_nucmer circl_using_spades circularised [merge circularised] unitig_96|quiver 0 1 0 1 [merge circularised] unitig_97|quiver 0 1 0 1
The only differences I see python module in versions is: pysam 0.8.3 pymummer 0.7.1 pyfastaq 3.12.1 which shouldn't make any difference. The only idea I have is that your circlator script is picking up a different version of the circlator module.
Dear Author,
I am trying to reproduce the results published in your article. After installing circlator, and all the other dependencies i get a error:
This is off course a very obvious error because the tools expect a file which contains a wrong name. This is a known bug in your 0.14.0 release?
Also i tried 1 of the newest releases 1.1.3. But here i cant reproduce your plasmodium results. It gives me that contig 96 is not circular.
See the folowing diff:
When i check if all dependencies are there it says all test succeeded. This is the point i'm stuck at.
Basically the question i have is:
How can i reproduce the findings of your published article: http://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0849-0
And which options are nice to take into account for circularizing genomes?
Thanks in advance!
Best regards,
Sander