In this page, we provide examples illustrating the different options offered by the platform.
metagm_build: "Quality control on reference genomes was done using checkm (ref), filtering out assemblies with more than 5% contamination or less than 90% completeness. High-quality assemblies were used to build a Kraken2 (ref) database. A Bracken (ref) database was also build."
metagm_classify: "Paired-end read files were classified using Kraken2 (ref) custom database (include description? n species?). Bracken (ref) was then applied to obtain species-level metagenomics profiles."
farm3
, as there is no large memory queue elsewhere (especially metagm_build
).~/.profile
file:# add the metagm library to your path
export PATH=/nfs/team162/kv4/github/metagm/metagm/wrapper:$PATH
Then: depending of the server you use (farm3 or farm5):
optional only if working on farm5 or pcs6 , load useful modules: module load /nfs/users/nfs_k/kv4/modules/metagm
Add the following line in your ~/.profile
: alias metagm_env='source /nfs/team162/kv4/bin/metagm_env/bin/activate'
Add the following line in your ~/.profile
: alias metagm_env_farm5='source /nfs/team162/kv4/bin/metagm_env_farm5/bin/activate'
Source it: source ~/.profile
(farm3) Run metagm_env
in the terminal it should slightly change the prompt (metagm_env
) in front of it
(farm5) Run metagm_env_farm5
in the terminal it should slightly change the prompt (metagm_env
) in front of it
Note: to leave the virtual environment, type deactivate
This section describes the process of building databases for different softwares (Kraken, Bracken, Mash, ...) using a list of reference genomes. It also takes care of submitting all the jobs on farm and manage job dependencies.
There is two mandatory positional arguments for this function:
genomes
list (text file):
.fa
or .fna
). contigs.fa
in the system and therefore are not unique names.pf assembly -t file -i YOUR_LIST_OF_LANEIDS.txt -l MY_SYMLINK_FOLDER
output
folder to store all the files produced by the scriptThe script metagm_build.py
also offers options in the building database process:
-h
: help display-v
: verbose mode for additional details on each step--QC
: run quality control on the list of genomes before building any database. If not done, the scirpt assumes that all the genomes have already been cheked.--taxoAssign
: run taxonomic assignment step on all the genomes. It will automatically be done if Kraken/Bracken databases are built.--KrakenDB
: build a Kraken2 database using the list of genomes --BrackenDB
: build a Bracken database using the list of genomes. Requires a Kraken database to exist, and will therefore automatically creates one --MashDB
: create a Mash sketch of all the genomes.--ncbi
: rely on NCBI taxonomy instead of gtdb (default: false)-b
: define the number of genomes to be analyzed in each batch (default: 10)-t
: define the number of threads used in each job (default: 2)-q
: define to which queue the jobs are submitted (default: long)-m
: define the amount of memory requested for each job (default: 64)--maxcontamination
: define the maximal value on checkm contamination to filter out a genome during QC (default: 5)--mincompleteness
: define the minimal value on checkm completeness to filter out a genome during QC (default: 90)The output
folder contains a directory for each task that is performed:
output/merge_final
contains quality control (QC) results for all the genomes
output/merge_final/ValidatedGenomes.txt
is the list of all genomes that pass QCoutput/merge_final/FilteredGenomes.txt
is the list of all genomes that fail QCoutput/merge_final/log.txt
provides details on why a genome failed QCoutput/tmp$i
folders contain the checkm lineage_wf output for the batch i
of genomes (if parallelized)output/Kraken
contains the Kraken2 database, as well as the Bracken files (if requested)output/Mash
contains the Mash sketch fileoutput/genome_with_[gtdb|ncbi]_taxid.txt
contains the list of taxids assigned to the genomes list if a taxonomic assignment step is performedThe following examples illustrate various features from the metagm_build.py
script. Depending how busy farm is, these examples can take some time to run.
metagm_build.py /nfs/team162/kv4/bin/list_example_pipeline.txt ./test --QC --taxoAssign -m 150
The command applies:
./merge_final/ValidatedGenomes.txt
, there is 12 genomes that passed QC./merge_final/FilteredGenomes.txt
, there is 1 genome that failed QC./merge_final/log.txt
, the genome was filtered because of XYZ./genome_with_gtdb_taxid.txt
The taxonomic assignment step is high in memory requirement (~150Gb).
This example achieves the same task than previously, except it takes advantage of job parallelization. Better than running jobs on 10 genomes at a time (2 jobs in previous example), we now do batches of 2 genomes (-b
flag), therefore submitting 7 smaller jobs.
#delete previous example folder
rm -rf ./test
# run the faster command
metagm_build.py /nfs/team162/kv4/bin/list_example_pipeline.txt ./test --QC --taxoAssign -m 150 -b 2
It is possible to skip the quality control and directly create Kraken database if user already checked the genomes. Additionally, the taxonomic assignment step can be ignored if user provides a taxid for each genome (third column in input list). Here, we use the curated genome list generated in the previous example.
#Run on a queue with 'hugemem' (e.g., farm3)
metagm_build.py ./genome_with_gtdb_taxid.txt ./test --KrakenDB --BrackenDB
The command builds:
It is possible to run all the previous steps in one single command: first, genomes are QC'ed, and then validated genomes are assigned taxonomically, finally a Kraken/Bracken database is built.
#delete previous example folder
rm -rf ./test
#Run on a queue with 'hugemem' (e.g., farm3)
metagm_build.py /nfs/team162/kv4/bin/list_example_pipeline.txt ./test --QC --KrakenDB --BrackenDB -m 100 -b 2
The command runs:
./merge_final/ValidatedGenomes.txt
, there is 12 genomes that passed QC./merge_final/FilteredGenomes.txt
, there is 1 genome that failed QC./merge_final/log.txt
, the genome was filtered because of XYZ./genome_with_gtdb_taxid.txt
metagm_build.py -h
kraken_build
job on hugemem
queue which means users need to run it on farm3 (or farm4/5 when this queue will be available).--QC
, --Kraken
, --Bracken
, or --Mash
is provided, the metagm_build.py
is not going to do anything.This section describes the process of classifying metagenomics sequencing reads to get both taxonomic and functional profiles. It also takes care of submitting all the jobs on farm and manage job dependencies.
There is two mandatory positional arguments for this function:
metagenomes
list (one-column text file):
/lustre/scratch118/infgen/team162/kv4/healthy/ERR209654
. The script will automatically seek for /lustre/scratch118/infgen/team162/kv4/healthy/ERR209654_1.fastq.gz
and /lustre/scratch118/infgen/team162/kv4/healthy/ERR209654_2.fastq.gz
.fastq
filesoutput
folder to store all the results produced by the scriptThe script metagm_classify.py
offers software options for classification:
-h
: help display-v
: verbose mode for additional details on each step--KrakenDB
: path to a Kraken2 database folder (example: /nfs/pathogen005/team162/Kraken1019
)--BrackenDB
: path to a Kraken2 database folder with Bracken files in it (example: /nfs/pathogen005/team162/Kraken1019
)--BrackenRank
: taxonomic rank for Bracken output (default: S for species) [options: D,P,C,O,F,G,S]--MashDB
: path to a .msh
Mash sketch (example /nfs/pathogen005/team162/RefSeq94n.msh
)--functional
: run a functional characterization of the metagenomes rather than a taxonomic one (default: false)--noMetaPhlan
: skip MetaPhlan2
step in functional analysis (default: false). will only work if these files already exist.-b
: define the number of metagenomes to be analyzed in each batch (default: 10)-t
: define the number of threads used in each job (default: 2)-q
: define to which queue the jobs are submitted (default: long)-m
: define the amount of memory requested for each job (default: 64)--merge
: will automatically merge all the output files from one method in a single table file (default: true)
Kraken
, it relies on the /nfs/team162/kv4/Kraken2Table.pl
script to generate two merged tables: a raw count and a normalized abundanceBracken
, it relies on the combine_bracken_outputs.py
script to generate a single table mixing raw and relative abundanceMash
, it relies on the /nfs/team162/kv4/bin/parse_mash.R
script to report high-confidence hits for each samplefunctional
analysis, data will always be merged in tables at different levels (gene families, pathways, GO categories, ...)The output
folder contains a directory for each task that is performed:
output/Kraken
contains the Kraken2 report output/Bracken
contains the Bracken filesoutput/Mash
contains the Mash sketch fileoutput/MetaPhlan2
contains all the output files generated by MetaPhlan2
during the functional analysisoutput/Humann2
contains all the output files generated by Humann2
during the functional analysis--BrackenDB
flag will automatically trigger --KrakenDB
with the same database, as Bracken
requires Kraken
output files to create its output files.* __current__ Kraken1019: using the Mgnify curated collection (31,555 genomes) from bacteria and archea. The taxon IDs have been assigned using GTDB.
* __current__ Kraken1019_with_nonBact_nonArch: same as Kraken1019, but also contains 131 gut-associated fungi, small eukaryota and other parasites.
* Kraken0419_kraken2_taxo_resolved: build with Kraken2, using the HBC + BGI collections and ~300 public genomes and the taxon IDs have been manually curated using GTDB rather than the default NCBI.
* InternalKraken_OCT2017: HBC + public genomes (@Sam), relied on the old QC for reference genomes
* RefSeq94n.msh: Mash sketch of the entire RefSeq (version 94)
* taxonomy/ : custom taxonomy built from GTDB for Bacteria and Archea, and saved in a NCBI format for Kraken. It also contains few viruses and eukaryotes.
The following examples illustrate various features from the metagm_classify.py
script. Depending how busy farm is, these examples can take some time to run.
metagm_classify.py --BrackenDB /nfs/pathogen005/team162/Kraken1019 /nfs/team162/kv4/bin/test_list_metagenomes_bangladesh.txt ./test_classify_pipeline -b 2
The command returns:
./test_classify_pipeline/Bracken/*S.bracken
, Bracken files on 6 metagenomes at species level (default).
./test_classify_pipeline/Bracken/merged_S.bracken
./test_classify_pipeline/Kraken/*.out
, Kraken2 files on 6 metagenomes.
./test_classify_pipeline/Kraken/merged_raw.txt
./test_classify_pipeline/Kraken/merged_norm.txt
./test_classify_pipeline/Kraken/*bracken.out
, Kraken2 files on 6 metagenomes, without unclassified reads.Very similar command to the previous example, except we introduce the --BrackenRank
to specify the rank Bracken should use:
metagm_classify.py --BrackenDB /nfs/pathogen005/team162/Kraken1019 /nfs/team162/kv4/bin/test_list_metagenomes_bangladesh.txt ./test_classify_pipeline_genus -b 2 --BrackenRank G
The command returns:
./test_classify_pipeline_genus/Kraken/*.out
, Kraken2 files on 6 metagenomes.
./test_classify_pipeline_genus/Kraken/merged_raw.txt
and the normalized abundance./test_classify_pipeline_genus/Kraken/merged_norm.txt
./test_classify_pipeline/Kraken/
as the Kraken step considers all ranks./test_classify_pipeline_genus/Bracken/*G.bracken
, Bracken files on 6 metagenomes at species level (default).
./test_classify_pipeline_genus/Bracken/merged_G.bracken
For this example, Refseq genomes are detected in metagenome samples. Please note that this exercise is not standard and still experimental, so be careful on the way you use the results. Likely, the detected genomes are from abundant organisms too.
metagm_classify.py --MashDB /nfs/pathogen005/team162/RefSeq94n.msh /nfs/team162/kv4/bin/test_list_metagenomes_bangladesh.txt ./test_classify_pipeline_mash -b 2
The command returns:
./test_classify_pipeline_mash/Mash/*_screen.tab
, Mash sketch on 6 metagenomes../test_classify_pipeline_mash/Mash/merged_mash.txt
, merged best hits detected with some confidence in metagenomesThis pipeline feature relies on
metagm_classify.py --functional /nfs/team162/kv4/bin/test_list_metagenomes_bangladesh.txt ./test_classify_pipeline_functional -b 2
The command returns:
In this section, we describe the process used to assign a given genome to the current taxonomy.
As mentioned in the section 'metagm_build module', users have the option to rely on either the gtdb taxonomy done with gtdb-tk classify_wf
function.
Current taxonomic tree build from GTDB metadata is stored here: /nfs/pathogen005/team162/taxonomy
Non bacterial and non archeal organisms cannot be assigned using gtdb and need to be manually annotated using /nfs/pathogen005/team162/taxonomy/names.dmp
file
Kraken software relies on taxonomic information presented in the 'NCBI format' (a pair of nodes.dmp
and names.dmp
).
Unfortunately, gtdb does not provide (yet?) its taxonomy in such format. Therefore, we need to build a GTDB-based taxonomy that can be saved in the NCBI format.
# merge bacterial and archeal metadata
cat ar122_metadata.tsv bac120_metadata.tsv > bac_ar_metadata.tsv
# get unique taxonomic paths
cut -f17 bac_ar_metadata.tsv | sort -u > tmp.txt
# remove last line that contains header
head -n -1 tmp.txt > taxo_path_gtdb_bac120_ar122_unique.txt
#remove temporary file
rm tmp.txt
#import library
import pickle # save binary object, like trees
import sys # system command library
sys.path.append('/nfs/team162/kv4/github/metagm') # add the metagm library to your session
from metagm.phylogeny.TaxonomyTree import TaxonomyTree # class to generate trees
# this class only needs a list of taxonomic paths and will generate a taxonomy in NCBI format
tree = TaxonomyTree('taxo_path_gtdb_bac120_ar122_unique.txt')
# save the tree in NCBI format at the given location
tree.saveNCBIFormat('taxonomy_bac_ar')
# also save the tree in a Pickle format (Python binary) for later
tree.savePickleFormat('/nfs/team162/kv4/bin/gtdb_metadata/taxonomy_bac_ar/taxo.pyc')
If a tree needs to be updated by adding new nodes, it is not necessary to re-build it from scratch. User can provide a Pickle tree and new taxonomic paths:
#import library
import pickle # load binary object, like trees
import csv # read csv files
import sys # system command library
sys.path.append('/nfs/team162/kv4/github/metagm') # add the metagm library to your session
from metagm.phylogeny.TaxonomyTree import TaxonomyTree # class to generate trees
#load existing tree to add nodes
with open('taxonomy_bac_ar/taxo.pyc', "rb") as input_file:
tree = pickle.load(input_file)
#Here we want to add non bacterial and non archeal nodes (fungi, virus and other eukaryotes)
extraNodes = '/nfs/team162/kv4/Kraken1019/nonBacterial_taxopath.txt'
with open(extraNodes) as tsvfile:
reader = csv.reader(tsvfile, delimiter='\t')
for i, row in enumerate(reader):
print(row[0])
# add a node if it is new
tree.add_node(row[0])
# save the updated tree in NCBI format at the given location
tree.saveNCBIFormat('taxonomy')
# also save the tree in a Pickle format (Python binary) for later
tree.savePickleFormat('taxonomy/taxo.pyc')
This section provides some R snippets to do analysis using the metagm_classify output files.
More details are given in the R vignette '.Rmd', also found in this repository.
Databases
, right-click for Create Database Connection
Use Wizard
, then choose a name for the connection (example: microbiota_db
), then press Next
MySQL
as Database driver, then press Next
Finish
isolates
)All the genome assemblies can be found in /lustre/scratch118/infgen/team162/kv4/working_list_genomes_kraken
.
This section describes the content of quality control applied to genome assemblies in metagm_build module. Here are the different steps with their corresponding default values:
If you want to run PanPhlan tool for a specific species and get strain-level resolution, you might want to check if they provide a database for this species here.
If they do, we already have them downloaded in /lustre/scratch118/infgen/team162/ys4/panphlan/panphlan_db
.
If not, you could build your own folowwing this tutorial.
There is a wrapper script for [rnammer]() found in /nfs/team162/kv4/bin/run_rnammer.sh
.
It takes a list of paths to assembly in a text file as well as a output directory to store the results.
Example:
sh /nfs/team162/kv4/bin/run_rnammer.sh /nfs/team162/kv4/bin/test_rnammer.txt ./16S_extract/
This section describes the built-in classes created in the metagm
library. They can be used to improve further Python script development or add features in an efficient manner.
BacterialGenome()
class:This class creates a genome object from a sequence file and stores various information about it:
genomename
: name given to the genome (default: ``)genomefilepath
: location of the sequence file (default: ``)tmpdir
: where temporary results are stored (default: tmp)qc
: if true, will run quality control on this genome (default: true) seqrecord
: fasta
sequence efficently stored in python session (default: none)taxid
: known taxonomic ID for this genome (default: none)valid
: whether this genome passed quality control (default: none)VALIDATED_GENOMES
: text file containing genomes that passed QC (default: ValidatedGenomes.txt)FILTERED_GENOMES
: text file containing genomes that failed QC (default: FilteredGenomes.txt)LOG
: text file containing reasons genomes failed QC (default: log.txt)MAX_CONTIGS
: maximum number of contigs to pass QC (default: 400)MAX_SIZE
: maximum assembly size to pass QC (default: 8000000)COMPLETENESS_THRESHOLD
: minimum assembly completeness to pass QC (default: 90)CONTAMINATION_THRESHOLD
: maximum assembly contamination to pass QC (default: 5)QUEUE
: LSF queue to be used when dealing with this genome (default: long)NTHREADS
: number of threads to be used when dealing with this genome (default: 8)MEMORY
: memory (Gb) to be used when dealing with this genome (default: 8)ASSEMBLER
: default assembler to be used (default: spades)DEFAULT_ROARY_CUTOFF
: threshold for Roary operation (default: 0.99)annotation
: location of an annotation file for this genome (default: none)seq16s
: existing 16S sequence for this genome (default: none)filehash
: location of hash file (default: none)#in python3
#load libraries
import sys
sys.path.append('/nfs/team162/kv4/github/metagm')
from metagm.sequences.BacterialGenome import BacterialGenome
# create a genome using its sequence
g = BacterialGenome(genomefilepath='/lustre/scratch118/infgen/team162/kv4/working_list_genomes_kraken/GCA_900129655.1_IMG-taxon_2695420967_annotated_assembly_genomic.fna', genomename='Bacteroides clarus GCF_900129655', taxid='626929')
# it will run QC on it first (default)
# qc=False can be used in the previous command to avoid running QC
# then, asking if the genome passed QC
g.is_valid()
# True means the genome passed QC
# then, extract the 16S sequence of this genome (rnammer)
g.get_16s_sequence()
g.seq16s # it is a SeqRecord object
len(g.seq16s) # one 16S copy
#get the assocaited 16S fasta sequence
print(g.seq16s["rRNA_FQWK01000014.1_260-1774_DIR+"].format("fasta"))
GenomeList()
class:This class creates a object made of multiple BacterialGenome and makes it easy to serialize analyses. Its components are:
genomelist
: a list of BacterialGenome objects (default: list())taxidlist
: a list of taxids associated with genomes (default: None)taxidfile
: a file where taxids are stored (default: None)genomefilepathlist
: a list of paths where genome assemblies are located (default: {}) inputfile
: a text file containing the paths of all genomes (default: None)QUEUE
: LSF queue to be used when dealing with this genome (default: long)NTHREADS
: number of threads to be used when dealing with this genome (default: 4)MEMORY
: memory (Gb) to be used when dealing with this genome (default: 8)TMPDIR
: where temporary results are stored (default: tmp)#in python3.6
#load libraries
import csv
import sys
sys.path.append('/nfs/team162/kv4/github/metagm')
from metagm.sequences.BacterialGenome import BacterialGenome
from metagm.sequences.GenomeList import GenomeList
# create list of genomes from an input file
print('Loading list of genomes...')
glist = GenomeList(inputfile='/nfs/team162/kv4/bin/test_list_genomes.txt')
with open(glist.inputfile) as tsvfile:
reader = csv.reader(tsvfile, delimiter='\t')
for i, row in enumerate(reader):
g = BacterialGenome(genomefilepath=row[0], genomename=row[1], qc=False)
# append to existing list
glist.append(g, checkvalid=False)
#get list length -->10
len(glist)
#get name of first genome in list
glist.genomelist[0].genomename
#get path to third genome in list
glist.genomelist[3].genomefilepath
#remove the first genome in the list using its name
glist.remove(genomeidentifier=glist.genomelist[0].genomename)
#get list length --> 9
len(glist)
#get name of first genome in list --> different than previously
glist.genomelist[0].genomename
#get gtdb taxid for a list of genomes (submit 1 job on long queue, might take some time)
glist.list_assign_taxo_gtdb()
# create a 'myTaxids.txt' file with the results and store the results in the list also
glist.taxidfile
# get the list of taxids attached to the genomes
glist.taxidlist
Metagenome()
class:This class stores a metagenome sample: Its components are:
metafastqfile1
: a fastq file (default: None)metafastqfile2
: a fastq file, used only if data are paired (default: None)pairedFlag
: are the data paired-end? (default: False)zippedFlag
: are the data zipped? (default: False)krakenReport
: a path to Kraken2 report (default: None)brackenReport
: a path to Bracken report (default: None)mashReport
: a path to Mash report (default: None)metaphlanReport
: a path to MetaPhlan2 report (default: None)QUEUE
: LSF queue to be used when dealing with this genome (default: long)NTHREADS
: number of threads to be used when dealing with this genome (default: 4)MEMORY
: memory (Gb) to be used when dealing with this genome (default: 8)TMPDIR
: where temporary results are stored (default: tmp)#in python3.6
#load libraries
import sys
sys.path.append('/nfs/team162/kv4/github/metagm')
from metagm.sequences.Metagenome import Metagenome
# create a metagenome object from paired fastq
mg = Metagenome(metagenomename='ERR2835563', metafastqfile1='/lustre/scratch118/infgen/pathogen/pathpipe/pathogen_prok_external/seq-pipelines/human/gut_metagenome/TRACKING/235/PS.170.S4.5.metagenomic/SLX/ERX2842313/ERR2835563/ERR2835563_1.fastq.gz',metafastqfile2='/lustre/scratch118/infgen/pathogen/pathpipe/pathogen_prok_external/seq-pipelines/human/gut_metagenome/TRACKING/235/PS.170.S4.5.metagenomic/SLX/ERX2842313/ERR2835563/ERR2835563_2.fastq.gz')
# builder automatically detect if data are paired-end
mg.pairedFlag
# builder automatically detect if data are zipped
mg.zippedFlag
MetagenomeList()
class:This class creates a object made of multiple Metagenome and makes it easy to serialize analyses. Its components are:
metagenomelist
: a list of Metagenome objects (default: list())metaphlanFlag
: Has MetaPhlan2 already been run on these metagenomes? (default: False)brackenrank
: taxonomic rank for Bracken prediction (default: S for species)metaphlanID
: a list of MetaPhlan2 job IDs to manage Humann2 dependencies (default: None) outputDir
: where output files are stored (default: tmp)brackenOutputList
: list of files created by Bracken. It is used for the merge step (default: list())#in python3.6
#load libraries
import csv
import sys
import os
import re
sys.path.append('/nfs/team162/kv4/github/metagm')
from metagm.sequences.Metagenome import Metagenome
from metagm.sequences.MetagenomeList import MetagenomeList
# create list of metagenomes from an input file
mglist = MetagenomeList()
with open('/nfs/team162/kv4/bin/test_list_metagenomes_bangladesh.txt') as tsvfile:
reader = csv.reader(tsvfile, delimiter='\t')
for i, row in enumerate(reader):
# retrieve the paired-end fastq from folder path only
files = [f for f in os.listdir(os.path.dirname(row[0])) if re.match(os.path.basename(row[0])+'_[12].fastq.gz$', f)]
mg=Metagenome(metagenomename=os.path.basename(row[0]),metafastqfile1=os.path.dirname(row[0])+'/'+files[0], metafastqfile2=os.path.dirname(row[0])+'/'+files[1])
# append
mglist.append(mg)
#get list length --> 5
len(mglist)
#classify these samples with Kraken (submit jobs only and return the job IDs)
mglist.classify_kraken('/nfs/pathogen005/team162/Kraken0419_kraken2_taxo_resolved')
#results are stored in './tmp'
TaxonomyTree()
class:This class stores a tree structure using the ETE3 toolkit. Its components are:
tree
: a ETE3 tree object (default: Tree())maxtaxid
: largest taxid used in the tree (default: 1)#in python3.6
#load libraries
import pickle
import csv
import sys
sys.path.append('/nfs/team162/kv4/github/metagm')
from metagm.phylogeny.TaxonomyTree import TaxonomyTree
# create a tree object from a list of 10 paths
tree = TaxonomyTree('/nfs/team162/kv4/bin/test_list_taxopaths.txt')
tree.maxtaxid
#51 nodes were created
#print the tree in the terminal (ASCII format)
tree.printAscii()
#save tree in NCBI format
tree.saveNCBIFormat('./')
#save tree in Pickle format (can be loaded in any Python session)
tree.savePickleFormat('./taxo.pyc')
This section includes features that are either under-development, or not implemented yet:
metagm_classify.py
data loding step, by using split
command to generate batch files instead of the existing 'for' loop. IT will also rmeove the extra loop currently used for the last batch.