padpadpadpad / EdenMicrobes

Data and analysis of projects related to microbes at the Eden Project.
1 stars 0 forks source link

EdenMicrobes

The goal of EdenMicrobes is to document and track the progress of the datasets and analyses related to microbes at the Eden Project.

Outline

Using the enclosed controlled biomes of the Eden Project botanic garden as mesocosms, we aim to determine the effects of biotic (above ground plant) and abiotic (soil physicochemistry, microclimate) controls on soil microbial community (SMC) composition, after accounting for spatial scale effects.

Table 1: Sample design

Biome Habitat Plant diversity Plots Samples Extractions PCRs
Mediterranean South Africa High 4 16 32 96
Mediterranean Australia High 4 16 32 96
Mediterranean Citrus Low 2 8 16 48
Mediterranean Vines Low 2 8 16 48
Mediterranean Med Basin High 4 16 32 96
Rainforest Bamboo Low 2 8 16 48
Rainforest Cocoa Low 2 8 16 48
Rainforest Malaysia High 4 16 32 96
Rainforest West Africa High 4 16 32 96
Rainforest Amazon High 4 16 32 96
Totals 10 2 32 128 256 768

Figure 1: Sampling INSERT IMAGE

https://github.com/padpadpadpad/EdenMicrobes/blob/main/plots/

At each sample plot, from each corner of a 2 x 2 m quadrat, four soil samples were collected using sterile soil augers. Any leaf litter was removed from the surface, before approximately 200g of soil was collected, typically representing 3 auger cores worth of material, from within the first 10 cm of the topsoil. Auger blades were “cleaned” by immersion in soil adjacent to the collection point prior to each sampling event. Blades were changed between sampling of the humid and dry biomes. The soil cores were sealed in sterile plastic bags and transported to the onsite laboratory for immediate processing.

Two eDNA extractions were performed from a 15 g subsample of each of the 128 samples following methods developed by Taberlet et al. (2012b) and modified by Zinger et al. (2016). DNA extractions were conducted in the field lab less than 2 hours after collection using a NucleoSpin® Soil kit (Machery Nagel, Duren, Germany). Eight negative controls were included for a total of 256 DNA extractions. The last elution step of the DNA extraction protocol was not carried out on site, with DNA on the column instead stored with silica gel and transported back to the EDB lab in Toulouse for subsequent steps.

PCRs were performed in triplicate, meaning that each sample was extracted twice, and each extract was amplified 3 times, resulting in 6 replicates for each sample in total. Each PCR reaction was performed in a total volume of 20 μl and comprised 10 μl of AmpliTaq Gold Master Mix (Life Technologies, Carlsbad, CA, USA), 5.84 μl of Nuclease-Free Ambion Water (Thermo Fisher Scientific, Massachusetts, USA), 0.25 μM of each primer, 3.2 μg of BSA (Roche Diagnostic, Basel, Switzerland), and 2 μl DNA template that was 10-fold diluted to reduce PCR inhibition. PCR was conducted by targeting a range of barcode regions to sequence for Eukaryotes, Fungi and Bacteria under the following conditions :

Table 2: PCR Amplification specifications

Sper01 Bact01 Fung02 Euka02
Taxa Plants (Spermatophyta) Bacteria Fungi Eukaryotes
Target region P6 loop of thechloroplastic trnL intron V3-V4 regions of the 16S rRNA gene ITS1 region of the nuclear ribosomal RNA genes V7 region of the 18S rRNA gene
Forward sequence GGGCAATCCTGAGCCAA GGATTAGATACCCTGGTAGT CAAGAGATCCGTTGTTGAAAGTK TCACAGACCTGTTATTGC
Reverse sequence CCATTGAGTCTCTGCACCTATC CACGACACGAGCTGACG GGAAGTAAAAGTCGTAACAAGG TTTGTCTGCTTAATTSCG
Reference Taberlet et al. 2007 Parada et al., 2016; Apprill et al., 2015) Epp et al 2012; Taberlet et al 2018 Guardiola et al. 2015
Thermocycling (number of cycles, denaturation, annealing, elongation, final elongation) [35, 95°C (30s), 50°C (30s), 72°C (60s) - 72°C (7 min)] [30, 95°C (30s), 57°C (30s), 72°C (90s) - 72°C (7 min)] [35, 95°C (30s), 55°C (30s), 72°C (60s) - 72°C (7 min)] [35, 95°C (30s), 45°C (30s), 72°C (60s) - 72°C (7 min)]
Sequencing technology HiSeq MiSeq MiSeq HiSeq
Sequence length (l = min, L = max) l=10, L=220 l=30 L=400 l=30, L=900 l=90, L=200
Taxonomic reference database & threshold EMBLr141 SILVAngs v1.3 UNITE SILVAngs v1.3

Three negative PCR controls per plate were amplified and sequenced in parallel with the regular samples. Three positive controls were also included and consisted of XX TO CONFIRM XX. Six wells per PCR plate were left empty (non-used tag combinations) to control for tag jumps which can occur during amplification and sequencing (see below for downstream data curation). All PCR products were pooled and the library was constructed using the Illumina TruSeq NanoPCRFree kit following the supplier’s instructions (Illumina Inc., San Diego, California, USA). Sequencing was performed on a Hiseq run (Illumina platform,San Diego, CA, USA) at the GeT-Plage platform (Toulouse, France).

Bioinformatic analyses were performed on the GenoToul bioinformatics platform (Toulouse, France), with the OBITOOLS package (Boyer et al. 2016). PCR replicates were prepared and processed in 3 seperate libraries, with initial processing run for each of these seperately, prior to subsequent merging. First, ‘illuminapairedend’ was used to assemble paired-end reads. This algorithm is based on an exact alignment algorithm that considers the quality scores at all positions during the assembly process. Subsequently, we used the ‘ngsfilter’ command to identify and remove the primers and tags on each read, and assign reads to their respective samples. This program was used with its default parameters tolerating two mismatches for each of the two primers and no mismatch for the tags. Following this, sequencing reads were dereplicated using the ‘obiuniq’ command. Sequences of low quality (containing Ns or with paired-end alignment scores below 50) were excluded using the ‘obigrep’ command. The same command was used to exclude sequences represented by only one read (singletons) as they are more likely to be molecular artefacts (Taberlet et al. 2018). Sequences outside of the preset range were also discarded (90-200 in length for Eukaryotes; 30-400 for Bacteria; 30-900 for Fungi).

Datasets were subsequently filtered to remove contaminants as well as artefacts such as PCR chimeras and remaining sequencing errors, following Zinger et al. (2019) and using the metabaR R package (Zinger et al 2020), in R version 3.6.1 (R Development Core Team, 2013). The filtering process consisted of three steps: (i) a negative control-based filtering. ASVs whose maximum abundance was found in extraction/PCR negative controls were removed from the dataset, as they were likely to be reagent/aerosol contaminants, better amplified in the absence of competing DNA fragments as it is the case in biological samples. (ii) an abundance-based filtering. This procedure targets incorrect assignment of a few numbers of sequences corresponding to true ASVs occurring to the wrong sample, a phenomenon called “tag-switching” (Esling et al. 2015), “tag jumps” (Schnell et al. 2015) or “cross-talk” (Edgar 2018). It consists in setting ASVs abundances to 0 in samples where their abundance represents < 0.03% of the total OTU abundance in the entire dataset. (iii) Finally, we conducted a PCR-based filtering by considering any PCR reaction that yielded less than 1000 reads for fungi, bacteria and eukaryotes as non-functional, and removed them from the dataset. The number of reads, ASVs and PCRs removed at each stage for each marker are detailed in table 3.

Following initial curation, the three separate libraries were merged to create a single phyloseq object per marker which contained all of the ASVs x PCR reads. These were then assigned a taxonomy from the SILVA taxonomic database for Bacteria and Eukaryotes (version 1.3; release 132 Quast et al., 2012), and the UNITE data base () for fungi using the DADA2 pipeline ()....

To this, a further subset of curation was performed: (i) PCRs with a readcount below a set threshold XXXX were removed (ii) Using the DADA2 processing, with a bootstrapping score set at 50, ASVs were removed from the dataset if they could not be assigned to the Phylum level

Table 3: ASV number and total readcount evolution following bioinformatic curation and contaminant removal

paired ngs_filtered curated uniq no_singletons PCRs above threshold (>1000) Extraction Contaminent Removal PCR Contaminent Removal Sequencing Contaminent Removal
Stage ASV Reads ASV Reads ASV Reads ASV Reads ASV Reads ASV ASV ASV ASV
ITS1-Rep1 3205177 3205177 2255445 2255445 2243828 2243828 469368 2243828 76499 1850959 176 14 51 12
ITS1-Rep2 2754254 2754254 1932598 1932598 1919187 1919187 416437 1919187 63812 1566562 128 21 3 20
ITS1-Rep3 4054254 4054254 2800343 2800343 2786030 2786030 582040 2786030 94062 2298052 244 15 88 21
16S-Rep1 4539219 4539219 2697244 2697244 2682047 2682047 1504427 2682047 139331 1316951 237 287 71 244
16S-Rep2 4109968 4109968 2421281 2421281 2407686 2407686 1370039 2407686 125006 1162653 231 260 38 229
16S-Rep3 8175582 8175582 4813103 4813103 4786542 4786542 2575508 4786542 246502 2457536 244 618 109 470
18S-Rep1 16837584 16837584 14364998 14364998 14281333 14281333 677100 14281333 152495 13756728 250 67 37 270
18S-Rep2 12133280 12133280 10412433 10412433 10345662 10345662 503842 10345662 120786 9962606 249 61 20 172
18S-Rep3 9110898 9110898 7732534 7732534 7694307 7694307 408830 7694307 95583 7381060 244 65 19 150

Upon filtering completion, remaining PCRs per technical replicate were summed and the read count of technical replicates was normalised to reduce potential bias caused by PCR stochasticity and differential sequencing efforts. Standardization consisted in randomly resampling (with replacement) a number of reads that corresponded to the first quartile of the total read number for reads per samples. This returns samples with a read count equal across all samples, whilst maintaining sample specific OTU relative abundances. To do so, each OTU in each sample was resampled with replacement a thousand times, following an approach detailed by Veresoglou et al. (2019). Finally, in order to reduce stochastic variation of taxa from one soil core to another, and to match DNA sequencing data with the soil chemistry ones, the four replicate samples within each subplot were aggregated by summing reads (after normalisation).

The influence of plant community and abiotic conditions on local alpha-diversity, and regional beta-diversity of the SMC will then be assessed by conducting Principle Coordinate Ordination and PERMANOVA analysis. Structural Equation Models (SEMs) and variance partitioning will be used to explore the explanatory power of abiotic conditions, the identity of individual plants and plant community characteristics, whilst accounting for spatial autocorrelation.

It is hypothesised that abiotic conditions will explain the greatest proportion of community dissimilarity, with microclimate (soil temperature and humidity) having greater effects than soil chemistry. However, after accounting for abiotic variation plant and microbial diversity are likely to be positively correlated.

Scripts

Datasets

We have sequencing datasets of the biomes contained in data/sequencing and data from monitoring of abiotic variables present in climate_data.

Sequencing data (in data/sequencing)

Climate data (in data/climate)

Here and below, methods conducted by NRM Cawood Laboratories following their set methodology, with samples aggregated at the habitat level.

Contact

This project is primarily a collaboration between Daniel Padfield (d.padfield@exeter.ac.uk) at the University of Exeter and Julian Donald (formerly of the Eden Project).