Closed zhuchcn closed 7 months ago
The only place where the cleavage parameters are used is generating the canonical peptide pool. So technically we can let users generate different canonical peptide pools within the same index dir, because the annotation, genome, and proteome are all the same.
That's a good idea to run a dataset for sanity check. I have several PRs on the way so I'll just run the small collaborator's data (1 cell line) after all these PRs and before we release 1.3
The only place where the cleavage parameters are used is generating the canonical peptide pool. So technically we can let users generate different canonical peptide pools within the same index dir, because the annotation, genome, and proteome are all the same.
So with the current approach, the results would be wrong if there is a mismatch of cleavage parameters between generateIndex
and callVariant
? If so then I think updating that would be a good idea, we can even allow change in enzyme?
With the current approach, callVariant will actually fail if the parameters don't match with that from metadata.json
. Yeah we can change enzyme, too. I'll open an issue for this.
Description
I don't like the way we handle the index directory and files currently. Right now we store versions of moPepGen, python and biopython in genome.pkl, annotation_gene.idx, annotation_tx.idx, and proteome.pkl. So here I created a metadata.json file to store all essential parameters, including versions, cleavage parameters (enzyme, miscleavages, etc), and genome/annotation source (GENCODE or ENSEMBL).
I also created an
IndexDir
class to handle the index directory responsible for loading and dumping different index files, making the code more organized.Closes #818
Checklist
.png
, .jpeg
),.pdf
,.RData
,.xlsx
,.doc
,.ppt
, or other non-plain-text files. To automatically exclude such files using a .gitignore file, see here for example.CHANGELOG.md
under the next release version or unreleased, and updated the date.