harry-thorpe / piggy

Pipeline for analysing intergenic regions in bacteria
GNU General Public License v3.0
37 stars 7 forks source link

Getting a lower number of "core" IGRs in E. coli genomes #16

Closed mgalardini closed 5 years ago

mgalardini commented 7 years ago

Hi again,

really sorry to bother you so much, but I would like to understand whether I'm overlooking something when running piggy. I seem to be getting a very low number of "core" IGRs as compared to what you observe in your pre-print. As mentioned I'm running piggy on ~700 E. colis, so I was expecting similar number to what you observed.

piggy1

And this second plot which should mimic your figure 3b:

piggy2

I was wondering whether this is caused by either:

1- my genomes being draft 2- the E. colis ST131 being relatively similar to each other

Thanks for any help you might give me. Marco

p.s. the flat lines in the second plot correspond to a series of strains coming from evolution experiments, and therefore very similar to each other.

p.p.s. this is the command I used to run piggy:

piggy -i gff/ -o out/ -r roary/ -t 20
harry-thorpe commented 7 years ago

Hi Marco,

I expect this is to do with the set of E. coli you are using, and the default parameters of both Piggy and Roary.

The E. coli ST131s are indeed very closely related to each other, I'm not sure how many SNPs differentiate them on average (but they are in the same ST). What STs, CCs, phylogroups do yours come from?

By default, Piggy requires the IGRs to be 90% similar in both nuc and length identity to cluster. This length requirement is a crucial difference compared to Roary, which just has a minimum gene length (of 120 bp I think), so gene clusters often comprise fragments of genes which would be separated by Piggy on default settings. It would be worth having a look for this in your Roary output (from the min and max lengths in the gene_presence_absence). This is an important difference when comparing the outputs of the programs. To get a more comparable estimate, I would repeat with --len_id in Piggy set to 10.

The draft nature of your genomes could also affect this, but the ones I used were draft too, so just depends on the quality of the assemblies I guess.

One other thing is that Piggy requires IGRs to be in 99% of isolates to be called core. Is this the threshold you are using for the plots?

Thanks,

Harry

mgalardini commented 7 years ago

Hi Harry,

thanks a lot for your very useful insights; I should have read the methods section of your preprint more closely. Sorry about that. It indeed seems that when using the --len_id 10 option the results are more comparable to what you observe, especially for the saturation curves.

piggy3

And here's the "core" plot (using the 99% threshold you suggested):

piggy4

My E. colis come from a very wide range of sequence types, so the significantly lower core IGR set makes sense to me.

Thanks again for your help and patience!

Marco

harry-thorpe commented 7 years ago

Hi Marco,

That's quite reassuring, so a core of approx 1250 IGRs and 1900 genes? That seems reasonable across a diverse set of strains.

Thanks,

Harry

NguyenLePhuong commented 5 years ago

Hi experts

I download piggy and install it in ubuntu. However, when I run the tool, it showed this errors:

/home/leo/NGS/piggy/bin/piggy -i /home/leo/SCV_LCV/SCV_LCV_piggyin/*.gff -r /home/leo/SCV_LCV/pangenome3 -t 2 /home/leo/NGS/piggy /home/leo/NGS/piggy/scripts_piggy blastn found, command: /home/leo/anaconda2/bin/blastn cdhit found, command: /home/leo/anaconda2/bin/cd-hit-est mafft found, command: /home/leo/anaconda2/bin/mafft makeblastdb found, command: /home/leo/anaconda2/bin/makeblastdb parallel found, command: /home/leo/anaconda2/bin/parallel readdir() attempted on invalid dirhandle $dh at /home/leo/NGS/piggy/bin/piggy line 187. closedir() attempted on invalid dirhandle $dh at /home/leo/NGS/piggy/bin/piggy line 196. 0 isolates found

Could you suggest me the solutions?

Thank you very much,

Best regards, Le Phuong.

harry-thorpe commented 5 years ago

Hi Le Phuong,

This error is because the input directory you have specified doesn't actually exist. You do not need to give the '/*.gff' part - piggy will look for this for you. Try:

/home/leo/SCV_LCV/SCV_LCV_piggyin

as the input option. Let me know if this doesn't fix it.

NguyenLePhuong commented 5 years ago

Hi Harry,

I still have this problem, could you please check for me?

/home/leo/NGS/piggy/bin/piggy -i /home/leo/SCV_LCV/SCV_LCV_piggyin/*.gff -o /home/leo/SCV_LCV/SCV_LCV_piggy/SCV_LCV_piggy_output -r /home/leo/SCV_LCV/pangenome3 -t 8 -n 90 -l 90 -s 30-3000 -m gene_pair /home/leo/NGS/piggy /home/leo/NGS/piggy/scripts_piggy blastn found, command: /home/leo/anaconda2/bin/blastn cdhit found, command: /home/leo/anaconda2/bin/cd-hit-est mafft found, command: /home/leo/anaconda2/bin/mafft makeblastdb found, command: /home/leo/anaconda2/bin/makeblastdb parallel found, command: /home/leo/anaconda2/bin/parallel readdir() attempted on invalid dirhandle $dh at /home/leo/NGS/piggy/bin/piggy line 187. closedir() attempted on invalid dirhandle $dh at /home/leo/NGS/piggy/bin/piggy line 196. 0 isolates found Thank you very much for your time and precious help, Yours sincerely, Le Phuong.

harry-thorpe commented 5 years ago

Hi Le Phuong,

The problem is still because the input directory is incorrect. You need to change:

-i /home/leo/SCV_LCV/SCV_LCV_piggyin/*.gff to -i /home/leo/SCV_LCV/SCV_LCV_piggyin

and also the -m option is incorrect. you need to change:

-m gene_pair to -m g

Please try this and let me know if it works.

NguyenLePhuong commented 5 years ago

Hi Harry,

Thank you very much for your prompt response, but the problem is still existed leo@leo:~/NGS/Rstatistics$ /home/leo/NGS/piggy/bin/piggy -i /home/leo/SCV_LCV/SCV_LCV_piggyin -o /home/leo/SCV_LCV/SCV_LCV_piggy/SCV_LCV_piggy_output -r /home/leo/SCV_LCV/pangenome3 -t 8 -n 90 -l 90 -s 30-3000 -m g /home/leo/NGS/piggy /home/leo/NGS/piggy/scripts_piggy blastn found, command: /home/leo/anaconda2/bin/blastn cdhit found, command: /home/leo/anaconda2/bin/cd-hit-est mafft found, command: /home/leo/anaconda2/bin/mafft makeblastdb found, command: /home/leo/anaconda2/bin/makeblastdb parallel found, command: /home/leo/anaconda2/bin/parallel readdir() attempted on invalid dirhandle $dh at /home/leo/NGS/piggy/bin/piggy line 187. closedir() attempted on invalid dirhandle $dh at /home/leo/NGS/piggy/bin/piggy line 196. 0 isolates found

Is there any other possibilities to explain this?

Thank you very much,

Yours sincerely, Le Phuong.

harry-thorpe commented 5 years ago

Hi,

OK that's weird. what happens when you run:

ls /home/leo/SCV_LCV/SCV_LCV_piggyin

NguyenLePhuong commented 5 years ago

Dear Harry,

Thank you so much, I found the problem, the problem is the file is SCV_LCV_piggy_in instead of SCV_LCV_piggyin

Many thanks for your previous time with my stupid mistake, Yours sincerely, Le Phuong.

harry-thorpe commented 5 years ago

No problem, glad it worked!