PADLOC-DB

[!IMPORTANT] PADLOC >v2.0.0 is only compatible with PADLOC-DB >v2.0.0 and vice-versa. After you update PADLOC, make sure to update your database by running: padloc --db-update.

About

This repo contains the HMMs and system models used by PADLOC. The latest version of this database can be downloaded by running padloc --db-update.

Citation

If you use PADLOC or PADLOC-DB please cite:

Payne, L.J., Todeschini, T.C., Wu, Y., Perry, B.J., Ronson, C.W., Fineran, P.C., Nobrega, F.L., Jackson, S.A. (2021) Identification and classification of antiviral defence systems in bacteria and archaea with PADLOC reveals new system types. Nucleic Acids Research, 49, 10868-10878. doi: https://doi.org/10/gzgh

The HMMs and system models in PADLOC-DB were built and curated using the data and conclusions from many different sources, we encourage you to also give credit to these groups by reading their work and citing them where appropriate. References to relevant literature can be found in system_info.md.

System models (`sys/*.yaml`)

System models are written using YAML syntax:

force_strand: FALSE    # <- Specifies whether all system components should be on the same strand
maximum_separation: 4  # <- Number of unrelated genes allowed to separate each component
minimum_core: 4        # <- Number of core genes that need to be present
minimum_total: 5       # <- Number of total genes that need to be present
core_genes:            # <- Genes that are generally considered essential
  - GenA
  - GenB
  - GenC
  - GenD
secondary_genes:       # <- Genes that are not always present, but still contribute to minimum_total
  - GenX
neutral_genes:         # <- Genes that are not always present, and do not contribute to minimum_total
  - GenY
prohibited_genes:      # <- Genes that cannot be present
  - GenZ

HMM metadata (`hmm_meta.txt`)

Field	Example	Description
`hmm.accession`	`PDLC00001`	A unique identifier for each PADLOC-DB HMM. This should be consistent with the filename and `ACC` field of the HMM.
`hmm.name`	`GajA_6`	Another unique identifier for each HMM. If the HMM is from an external database, the original ID from that database is preferred. This should be consistent with the `NAME` field of the HMM.
`hmm.description`	`Predicted ATPase`	A one line description of the HMM e.g. describing protein function or domains. This should be consistent with the `DESC` field of the HMM.
`protein.name`	`PemK\\|MazF`	The name of the protein that the HMM represents. Multiple protein names are separated by a pipe (`\\|`). Capitalisation of the first letter is preferred. These are the names that are referenced in the system models.
`secondary.name`	`cas_effector`	A secondary identifier that can be references in the system definition for groups of proteins rather than individual protein names.
`author`	`Payne LJ, Jackson SA`	Author(s) of the entry. If the HMM is from an external database, crediting the original author is preferred. If the HMM is from a study where an author was not explicity credited, authorship is attributed to the first author of the study. Multiple authors are separated by a comma (`,`).
`e.value.threshold`	`1e-05`	The minimum domain iE-value allowed when reporting hits.
`hmm.coverage.threshold`	`0.3`	The proportion of the HMM that must contribute to the HMM/target alignment when reporting hits.
`target.coverage.threshold`	`0.3`	The proportion of the target that must contribute to the HMM/target alignment when reporting hits.
`literature.ref`	`10/f2wkj3`	The DOI for the literature that implies that the HMM or underlying protein sequences belong to a particular protein family or defence system. Multiple references are separated by a comma (`,`). DOIs can be resolved at doi.org.
`database.ref`	`PFAM; PF06527`	Reference to the original accession of the alignment/HMM if it was taken from an external database, e.g. PFAM, COG, etc. Includes the name of the database and the identifier separated by a semicolon (`;`).
`comment`	`NA`	Other relevant information. This may include comments from the original database where applicable.

System metadata (`sys_meta.txt`)

Field	Example	Description
`system`	`Druantia Type I`	The human-readable name for the system.
`yaml.name`	`druantia_type_I`	The exact name of the yaml file that corresponds with the system type (without the `.yaml` extension).
`search`	`TRUE`	Set to `TRUE` or `FALSE` to determine whether the system is searched or not.
`comment`	`NA`	Other relevant information.
`references`	`payne2021:10/gzgh`	References to relevant literature that describe the system in the form `surnameYEAR:DOI`. Multiple references are separated by a semicolon (`;`). DOIs can be resolved at doi.org.
`group`	`DMS`	Arbitrary system groups used for dividing data on the PADLOC web server.
`fill`	`#F1F1D1`	Arbitrary HEX code used for colouring genes on the PADLOC web server.
`stroke`	`#CACA56`	Arbitrary HEX code used for colouring genes on the PADLOC web server.

Adding to the database

Additional HMMs and system models can be added directly to your copy of this database, or a separate database can be set up in the same way as this one.

Additional HMMs can be added to the hmm/ directory, hmm_meta.txt also needs to be updated to include metadata for the new HMMs. When using with PADLOC, these HMMs need to be concatenated into a single file in the hmm/ directory called padlocdb.hmm. This is done automatically when downloading the database using padloc --db-update or padloc --db-install.

Additional system models can be added to the sys/ directory, sys_meta.txt should also be updated to include metadata for the new models.

License

This software and data is available as open source under the terms of the MIT License.

padlocbio / padloc-db

readme

PADLOC-DB

About

Citation

System models (`sys/*.yaml`)

HMM metadata (`hmm_meta.txt`)

System metadata (`sys_meta.txt`)

Adding to the database

License

padlocbio / padloc-db

readme

PADLOC-DB

About

Citation

System models (sys/*.yaml)

HMM metadata (hmm_meta.txt)

System metadata (sys_meta.txt)

Adding to the database

License

System models (`sys/*.yaml`)

HMM metadata (`hmm_meta.txt`)

System metadata (`sys_meta.txt`)