uclahs-cds / package-moPepGen

Multi-Omics Peptide Generator
https://uclahs-cds.github.io/package-moPepGen/
GNU General Public License v2.0
6 stars 1 forks source link

Add `updateIndex` #854

Closed zhuchcn closed 9 months ago

zhuchcn commented 9 months ago

Description

Command updateIndex added to create a new canonical peptide pool to an existing moPepGen index directory. So the index directory now can support multiple canonical peptide pools. When callVariant is called, the correct canonical peptide pool that matches with the input cleavage parameters (including enzyme) will be loaded. And of cause it will fail if such a canonical peptide doesn't exist.

The index dir looks like this now:

test/files/index
├── annotation.gtf
├── annotation_gene.idx
├── annotation_tx.idx
├── canonical_peptides_001.pkl
├── canonical_peptides_002.pkl
├── coding_transcripts.pkl
├── genome.pkl
├── metadata.json
└── proteome.pkl

The metadata.json looks like this now:

{
  "version": {
    "python": "3.8.17",
    "biopython": "1.82",
    "mopepgen": "1.3.0"
  },
  "canonical_pools": [
    {
      "filename": "canonical_peptides_001.pkl",
      "index": 1,
      "cleavage_params": {
        "enzyme": "trypsin",
        "exception": "trypsin_exception",
        "miscleavage": 2,
        "min_mw": 500.0,
        "min_length": 7,
        "max_length": 25
      }
    },
    {
      "filename": "canonical_peptides_002.pkl",
      "index": 2,
      "cleavage_params": {
        "enzyme": "lysc",
        "exception": null,
        "miscleavage": 2,
        "min_mw": 500.0,
        "min_length": 7,
        "max_length": 25
      }
    }
  ],
  "source": "GENCODE"
}

Closes #853

Checklist

zhuchcn commented 9 months ago

I put more detail to the PR description a little bit. The documentation site will be built automatically after the PR is merged. I added a page, and I don't think users need to know too much detail of how canonical peptides are organized. The software already provides enough information.