keoughkath / AlleleAnalyzer

A software tool for personalized and allele-specific CRISPR editing.
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1783-3
MIT License
16 stars 5 forks source link

Rework of annot_variants. Works with single or multi-locus gens files. #40

Closed allgenesconsidered closed 6 years ago

allgenesconsidered commented 6 years ago

Large changes to annot_variants.py. Before, annot_var only checked the first row's chrom value, making it incompatible with multi-locus gens files. It was also not very strick with this, and attempted to use the position value of a variant even if the chromosome was different.

I implemented annot_var to first gather all chromosomes in a gens file, and if the script detects a multi-locus gens file the gens file dataframe is split by chromosome. The script then iterate through a list of gens dataframes, joining the dataframes at the end. I've tested this script with several cas lists and with both single-locus and multi-locus gens, and the output seems to be fine. You will also get error messages produced for missing pam.npy files or a mising FASTA file.

allgenesconsidered commented 6 years ago

A few other points, I've removed the --bed option and made the script a "one-size-fits-all" solution. It can easily detect a multi-locus gens file. I've removed a ton of repetitive code in favor of this solution.

The biggest problem now is speed, which was always a problem with this script. Single-locus gens files are annotated at the same speed, but multi-locus gens file take minutes to run. Something to note.