samhorsfield96 / ggCaller_manuscript

Repository of scripts used in ggCaller manuscript.
MIT License
0 stars 1 forks source link

Issue while simulating a pangenome #4

Closed ggautreau closed 1 year ago

ggautreau commented 1 year ago

Hello,

I'm looking to create a pangenome using your scripts as a means to compare ggCaller against tools that were not included in your paper, but I've encountered some problems during the pangenome simulation process.

I used the command below:

python scripts/simulate_full_pangenome.py --gff data/simulated_pangenome/SP_ATCC700669.gff3 --nisolates 100 --n_sim_genes 1000 --pop_size 10e-6 --out sim_pangenome --mutation_rate 1e-14 --gain_rate 1e-12 --loss_rate 1e-12

However, the output indicated a problem:

home/X/miniconda3/envs/ggCaller_manuscript/lib/python3.12/site-packages/Bio/Seq.py:2804: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future.
  warnings.warn(
accessory size:  0
mutations in genome:  0
genes deleted:  0
mutations in genome:  0
genes deleted:  0
mutations in genome:  0
genes deleted:  0
mutations in genome:  0
genes deleted:  0
mutations in genome:  0
genes deleted:  0
mutations in genome:  0
genes deleted:  0
....

As a result, the simulated genomes were identical. I would appreciate any guidance you might have on this issue.

Also, I'd like to express my gratitude for the insightful paper!

Best regards,

samhorsfield96 commented 1 year ago

Hi, this may be an issue with parsing the float arguments from the command line. Would you be able to try with default parameters, just specifying the input gff and output path and see what happens?

ggautreau commented 1 year ago

Hi,

Actually, pop_size should be a sizable value and cannot be set to 10^-6.

The fixed output is:

accessory size:  134
mutations in genome:  2099
genes deleted:  128
mutations in genome:  2096
genes deleted:  127
mutations in genome:  2111
genes deleted:  124
mutations in genome:  2114
genes deleted:  122
mutations in genome:  2098
genes deleted:  123
mutations in genome:  2100
...

Thanks.

samhorsfield96 commented 1 year ago

Solved in commit ea2cc04, with thanks to ggautreau.