Closed mgalardini closed 5 years ago
Hi Marco,
I expect this is to do with the set of E. coli you are using, and the default parameters of both Piggy and Roary.
The E. coli ST131s are indeed very closely related to each other, I'm not sure how many SNPs differentiate them on average (but they are in the same ST). What STs, CCs, phylogroups do yours come from?
By default, Piggy requires the IGRs to be 90% similar in both nuc and length identity to cluster. This length requirement is a crucial difference compared to Roary, which just has a minimum gene length (of 120 bp I think), so gene clusters often comprise fragments of genes which would be separated by Piggy on default settings. It would be worth having a look for this in your Roary output (from the min and max lengths in the gene_presence_absence). This is an important difference when comparing the outputs of the programs. To get a more comparable estimate, I would repeat with --len_id in Piggy set to 10.
The draft nature of your genomes could also affect this, but the ones I used were draft too, so just depends on the quality of the assemblies I guess.
One other thing is that Piggy requires IGRs to be in 99% of isolates to be called core. Is this the threshold you are using for the plots?
Thanks,
Harry
Hi Harry,
thanks a lot for your very useful insights; I should have read the methods section of your preprint more closely. Sorry about that. It indeed seems that when using the --len_id 10
option the results are more comparable to what you observe, especially for the saturation curves.
And here's the "core" plot (using the 99% threshold you suggested):
My E. colis come from a very wide range of sequence types, so the significantly lower core IGR set makes sense to me.
Thanks again for your help and patience!
Marco
Hi Marco,
That's quite reassuring, so a core of approx 1250 IGRs and 1900 genes? That seems reasonable across a diverse set of strains.
Thanks,
Harry
Hi experts
I download piggy and install it in ubuntu. However, when I run the tool, it showed this errors:
/home/leo/NGS/piggy/bin/piggy -i /home/leo/SCV_LCV/SCV_LCV_piggyin/*.gff -r /home/leo/SCV_LCV/pangenome3 -t 2 /home/leo/NGS/piggy /home/leo/NGS/piggy/scripts_piggy blastn found, command: /home/leo/anaconda2/bin/blastn cdhit found, command: /home/leo/anaconda2/bin/cd-hit-est mafft found, command: /home/leo/anaconda2/bin/mafft makeblastdb found, command: /home/leo/anaconda2/bin/makeblastdb parallel found, command: /home/leo/anaconda2/bin/parallel readdir() attempted on invalid dirhandle $dh at /home/leo/NGS/piggy/bin/piggy line 187. closedir() attempted on invalid dirhandle $dh at /home/leo/NGS/piggy/bin/piggy line 196. 0 isolates found
Could you suggest me the solutions?
Thank you very much,
Best regards, Le Phuong.
Hi Le Phuong,
This error is because the input directory you have specified doesn't actually exist. You do not need to give the '/*.gff' part - piggy will look for this for you. Try:
/home/leo/SCV_LCV/SCV_LCV_piggyin
as the input option. Let me know if this doesn't fix it.
Hi Harry,
I still have this problem, could you please check for me?
/home/leo/NGS/piggy/bin/piggy -i /home/leo/SCV_LCV/SCV_LCV_piggyin/*.gff -o /home/leo/SCV_LCV/SCV_LCV_piggy/SCV_LCV_piggy_output -r /home/leo/SCV_LCV/pangenome3 -t 8 -n 90 -l 90 -s 30-3000 -m gene_pair /home/leo/NGS/piggy /home/leo/NGS/piggy/scripts_piggy blastn found, command: /home/leo/anaconda2/bin/blastn cdhit found, command: /home/leo/anaconda2/bin/cd-hit-est mafft found, command: /home/leo/anaconda2/bin/mafft makeblastdb found, command: /home/leo/anaconda2/bin/makeblastdb parallel found, command: /home/leo/anaconda2/bin/parallel readdir() attempted on invalid dirhandle $dh at /home/leo/NGS/piggy/bin/piggy line 187. closedir() attempted on invalid dirhandle $dh at /home/leo/NGS/piggy/bin/piggy line 196. 0 isolates found Thank you very much for your time and precious help, Yours sincerely, Le Phuong.
Hi Le Phuong,
The problem is still because the input directory is incorrect. You need to change:
-i /home/leo/SCV_LCV/SCV_LCV_piggyin/*.gff to -i /home/leo/SCV_LCV/SCV_LCV_piggyin
and also the -m option is incorrect. you need to change:
-m gene_pair to -m g
Please try this and let me know if it works.
Hi Harry,
Thank you very much for your prompt response, but the problem is still existed leo@leo:~/NGS/Rstatistics$ /home/leo/NGS/piggy/bin/piggy -i /home/leo/SCV_LCV/SCV_LCV_piggyin -o /home/leo/SCV_LCV/SCV_LCV_piggy/SCV_LCV_piggy_output -r /home/leo/SCV_LCV/pangenome3 -t 8 -n 90 -l 90 -s 30-3000 -m g /home/leo/NGS/piggy /home/leo/NGS/piggy/scripts_piggy blastn found, command: /home/leo/anaconda2/bin/blastn cdhit found, command: /home/leo/anaconda2/bin/cd-hit-est mafft found, command: /home/leo/anaconda2/bin/mafft makeblastdb found, command: /home/leo/anaconda2/bin/makeblastdb parallel found, command: /home/leo/anaconda2/bin/parallel readdir() attempted on invalid dirhandle $dh at /home/leo/NGS/piggy/bin/piggy line 187. closedir() attempted on invalid dirhandle $dh at /home/leo/NGS/piggy/bin/piggy line 196. 0 isolates found
Is there any other possibilities to explain this?
Thank you very much,
Yours sincerely, Le Phuong.
Hi,
OK that's weird. what happens when you run:
ls /home/leo/SCV_LCV/SCV_LCV_piggyin
Dear Harry,
Thank you so much, I found the problem, the problem is the file is SCV_LCV_piggy_in instead of SCV_LCV_piggyin
Many thanks for your previous time with my stupid mistake, Yours sincerely, Le Phuong.
No problem, glad it worked!
Hi again,
really sorry to bother you so much, but I would like to understand whether I'm overlooking something when running piggy. I seem to be getting a very low number of "core" IGRs as compared to what you observe in your pre-print. As mentioned I'm running piggy on ~700 E. colis, so I was expecting similar number to what you observed.
And this second plot which should mimic your figure 3b:
I was wondering whether this is caused by either:
1- my genomes being draft 2- the E. colis ST131 being relatively similar to each other
Thanks for any help you might give me. Marco
p.s. the flat lines in the second plot correspond to a series of strains coming from evolution experiments, and therefore very similar to each other.
p.p.s. this is the command I used to run piggy: