liaochenlanruo / pgcgap

The Prokaryotic Genomics and Comparative Genomics Analysis Pipeline
GNU General Public License v3.0
36 stars 7 forks source link

How to read three genome file formats together #5

Open makerer5 opened 1 month ago

makerer5 commented 1 month ago

hi I want to use pgcgap to construct a whole genome phylogenetic tree of 120 strains. I used pgcgap to read in the genome file, but it seems that pgcgap can only read files of the same format. However, my 120 genomes contain three formats of files: double-end R1.fq.gz, R2.fq.gz; single-end .fq.gz (downloaded from NCBI); genbank (.gb). How can I read in 120 genome files in three formats using the following command: pgcgap --All --platform illumina --filter_length 200 --ReadsPath Reads/Illumina --reads1 _1.fastq.gz --reads2 _2.fastq.gz --suffix_len 11 --kmmer 81 --genus Escherichia --species coli --codon 11 --strain_num 6 --threads 4 --VAR --refgbk /mnt/h/PGCGAP_Examples/Reads/MG1655.gbff --qualtype sanger

liaochenlanruo commented 1 month ago

Hi, PGCGAP can only take one format for input. However, You can assemble paired-end reads and single-end reads separately, and then conduct other analyses. I do not recommend you to use gbk files for analysis. Instead, you can download the scaffolds file corresponding to the gbk, and use it together with the scaffolds file obtained from the previous assembly of reads as the input files for PGCGAP for downstream analysis.

makerer5 commented 1 month ago

Thank you very much. This is indeed a very good idea, thank you for your guidance!

makerer5 commented 1 month ago

Hi After you gave the instructions to "assemble single-end and double-end files separately", I downloaded ".fa.gz" and ".fasta" files from NCBI. How can I use pgcgap to read single-end files or fasta files?

liaochenlanruo commented 1 month ago
# assemble the reads one by one
abyss-pe name=strainname k=81 se='.fa'
pgcgap --Annotate --scafPath ./Scaf --Scaf_suffix .fasta  --codon 11 --threads 4