How to read three genome file formats together

liaochenlanruo / pgcgap

The Prokaryotic Genomics and Comparative Genomics Analysis Pipeline

GNU General Public License v3.0

36 stars 7 forks source link

How to read three genome file formats together #5

Open makerer5 opened 1 month ago

makerer5 commented 1 month ago

hi I want to use pgcgap to construct a whole genome phylogenetic tree of 120 strains. I used pgcgap to read in the genome file, but it seems that pgcgap can only read files of the same format. However, my 120 genomes contain three formats of files: double-end R1.fq.gz, R2.fq.gz; single-end .fq.gz (downloaded from NCBI); genbank (.gb). How can I read in 120 genome files in three formats using the following command: pgcgap --All --platform illumina --filter_length 200 --ReadsPath Reads/Illumina --reads1 _1.fastq.gz --reads2 _2.fastq.gz --suffix_len 11 --kmmer 81 --genus Escherichia --species coli --codon 11 --strain_num 6 --threads 4 --VAR --refgbk /mnt/h/PGCGAP_Examples/Reads/MG1655.gbff --qualtype sanger

liaochenlanruo commented 1 month ago

Hi, PGCGAP can only take one format for input. However, You can assemble paired-end reads and single-end reads separately, and then conduct other analyses. I do not recommend you to use gbk files for analysis. Instead, you can download the scaffolds file corresponding to the gbk, and use it together with the scaffolds file obtained from the previous assembly of reads as the input files for PGCGAP for downstream analysis.

makerer5 commented 1 month ago

Thank you very much. This is indeed a very good idea, thank you for your guidance!

makerer5 commented 1 month ago

Hi After you gave the instructions to "assemble single-end and double-end files separately", I downloaded ".fa.gz" and ".fasta" files from NCBI. How can I use pgcgap to read single-end files or fasta files?

liaochenlanruo commented 1 month ago

For single-pair reads, you can use the abyss software to assemble the reads to scaffold (fasta file), and then put all .fasta files in one folder/directory, for example, named as "Scaf".

# assemble the reads one by one
abyss-pe name=strainname k=81 se='.fa'

for the fasta files in the directory "Scaf", you can run the following command to annotate

pgcgap --Annotate --scafPath ./Scaf --Scaf_suffix .fasta  --codon 11 --threads 4

After that, you can run other analysis refer to This instruction