sanger-pathogens / Roary

Rapid large-scale prokaryote pan genome analysis
http://sanger-pathogens.github.io/Roary
Other
303 stars 189 forks source link

NCBI annotated sequence use as input of Roary #545

Open manoj044 opened 2 years ago

manoj044 commented 2 years ago

NCBI sequence (.fna) and gene annotation file (.gff) downloads using curl --remote-name --remote-time ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/192/045/GCF_000192045.2_ASM19204v3/GCF_000192045.2_ASM19204v3_genomic.fna.gz

These two files was merged to create a gff3 file using cat genome.gff genome.fna > genome1.gff

These files used as input for roary roary -e -n -r -i 90 --cd 100 --mafft -g 100000 -f with_plots -p 16 *.gff but following error is showing

Use of uninitialized value in require at /apps/roary/3.12.0/lib/site_perl/5.26.2/x86_64-linux-thread-multi/Encode.pm line 61.

2021/12/16 23:37:05 Input file contains duplicate gene IDs, attempting to fix by adding a unique suffix, new GFF in the fixed_input_files directory: /all_gff_files/genome.gff

Use of uninitialized value $cells[8] in split at /apps/roary/3.12.0/lib/site_perl/5.26.2/Bio/Roary/ReformatInputGFFs.pm line 135, <$input_gff_fh> line 8474.

Use of uninitialized value within @cells in join or string at /apps/roary/3.12.0/lib/site_perl/5.26.2/Bio/Roary/ReformatInputGFFs.pm line 152, <$input_gff_fh> line 8474.

Can anyone help to solve this problem or suggest an alternative for using NCBI and IMG (IMGAP v5.0.23) annotated sequence. @andrewjpage

akuri-cyber commented 2 years ago

I come cross the same question with you.You could change the id name within gff files. But I have 44 annotated file from NCBI, after I change the name, there is still 18 gffs not work.