Open Norman-Kuo opened 7 years ago
The unique genes are also included in the accessory genome. You should be aware however that the unique genes can be quite noisy, containing assembly errors and bits of random contamination, so treat them with caution. You can extract genes using the query pan genome script.
Thanks for the response Sorry but I am not sure how to extract unique genes from the accessory genome. Which option of query pan genome do I use to extract unique genes? is it (-a difference)?
Thank you
This is old but I will add an answer that might help someone. First I would like to add that the script query_pan_genome also failed with me to retrieve unique IDs or fasta.
If you look into clustered_proteins file, you'll notice that clusters are shown with respective IDs, from bigger clusters to smaller. That means, from a point until the end only clusters with one gene ID will be listed. After you find that point in the file, cut these to a new file, then just grep the desired ID, and you will end up with a unique ID list. Later you can parse your genbank with a bioperl script to retrieve sequences.
Best,
Hi
I have able to successfully get the pan-genome, core genome and accessory genome from our isolates. What I am also interested is getting unique genes from each isolates. I am wondering if there is anyway to get that information.
What I was thinking originally was to subtract the core genome (intersect) and accessory genome from whole genome (Pan-genome) to get unique genome, but I got 0 because they add up perfectly. I am wondering is it due to the accessory genome file also contains the individual unique genes, and if there is anyway to extract them.
Thank you