rraadd88 / beditor

A Computational Workflow for Designing Libraries of sgRNAs for CRISPR-Mediated Base Editing, and much more
GNU General Public License v3.0
17 stars 4 forks source link

Running with hg19 #1

Closed murphycj closed 5 years ago

murphycj commented 5 years ago

Hi, I am getting an error message when trying to run beditor with hg19. Maybe I am running beditor incorrectly, but it seems like beditor is calling pyensembl in a wrong way.

The command I am running:

beditor --cfg configuration.yaml

The error message:

start
log file: .log_beditor_2018_10_08_15_44_59_411097_configuration.yaml_None_None_False_False.log
2018-10-08 15:45:00,098 - pyensembl.shell - INFO - Running 'install' for EnsemblRelease(release=75, species='homo_sapiens')
2018-10-08 15:45:00,688 - pyensembl.sequence_data - INFO - Loaded sequence dictionary from /Users/charlesmurphy/Library/Caches/pyensembl/GRCh37/ensembl75/Homo_sapiens.GRCh37.75.cdna.all.fa.gz.pickle
2018-10-08 15:45:00,762 - pyensembl.sequence_data - INFO - Loaded sequence dictionary from /Users/charlesmurphy/Library/Caches/pyensembl/GRCh37/ensembl75/Homo_sapiens.GRCh37.75.ncrna.fa.gz.pickle
2018-10-08 15:45:00,921 - pyensembl.sequence_data - INFO - Loaded sequence dictionary from /Users/charlesmurphy/Library/Caches/pyensembl/GRCh37/ensembl75/Homo_sapiens.GRCh37.75.pep.all.fa.gz.pickle
Traceback (most recent call last):
  File "/Users/charlesmurphy/anaconda3/envs/beditor/bin/beditor", line 11, in <module>
sys.exit(main())
  File "/Users/charlesmurphy/anaconda3/envs/beditor/lib/python3.6/site-packages/beditor/pipeline.py", line 491, in main
test=args.test,force=args.force)
  File "/Users/charlesmurphy/anaconda3/envs/beditor/lib/python3.6/site-packages/beditor/pipeline.py", line 341, in pipeline
cfg=get_genomes(cfg)
  File "/Users/charlesmurphy/anaconda3/envs/beditor/lib/python3.6/site-packages/beditor/configure.py", line 88, in get_genomes
cfg['genomeassembly']: (cfg['genomerelease'], cfg['genomerelease']),
  File "/Users/charlesmurphy/anaconda3/envs/beditor/lib/python3.6/site-packages/pyensembl/species.py", line 57, in register
cls._reference_names_to_species[reference_name]))
ValueError: Can't use reference 'GRCh37' for both Species(latin_name='homo_sapiens', synonyms=['homo_sapiens'], reference_assemblies={'GRCh37': (75, 75)}) and Species(latin_name='homo_sapiens', synonyms=['human'], reference_assemblies={'GRCh38': (76, 93), 'GRCh37': (55, 75), 'NCBI36': (54, 54)})

My configuration file (configuration.yaml):

# input file path
dinp: test.tsv

#common crispr params
#guidel: 23
pams: ['NGG']

#common
## cpus/threads
cores: 6
## number of lines to process per cpu
chunksize: 200

# 01_sequences
## host information
host: homo_sapiens
genomerelease: 75
# check assembly from http://useast.ensembl.org/index.html
genomeassembly: GRCh37

# 02_mutations
# whether aminoacid or nucleotide mutations
mutation_format: nucleotide

##[N nonsyn] S syn else both
# mutation_type: N
## keep nonsense
# keep_mutation_nonsense: False
## allowed nucleuotide substitutions per codon
max_subs_per_codon: 1
## base editors to use (restriction max_subs_per_codon would override the choice of base editors)
BEs: ['Target-AID','ABE']

## Mutations information can be provided in 3 options:
## 1. Required Mutations mentioned in input file (in a column called "amino acid mutation") would override this
## 2. Required Substitutions provided as a file
## 3. Carry out Mimetic substitutions (base on genome wide substitution maps). Only for human and yeast.
## input: options
## mutations: 1, substitutions: 2, mimetic: 3, [no input: keeps all possible mutations (slow)]
mutations:

## Parameters specific to above options
## 2. Substitutions provided as a file
dsubmap_preferred_path:

## 3. Mimetic substitutions
## mimetism level (high: only the best one, [medium: best 5], low: best 10)
mimetism_level: medium

## can not mutate between these
# non_intermutables: ['S','T','K']

Some relevant software versions:

Mac 10.13.6 pyensembl v1.7.2

rraadd88 commented 5 years ago

Hi @murphycj , The issue most probably is the pyensembl version. I have rewritten the installation part of pyensembl because I had experienced errors such as the one you posted with the master branch of pyensembl.

My rewritten version of the pyensembl can be installed by following command

pip install git+https://github.com/rraadd88/pyensembl

That'd be version 1.6.0. Or ideally it would be better to install the whole virtual environment with following commands (as mentioned in the docs.)

wget https://raw.githubusercontent.com/rraadd88/beditor/master/environment.yml
conda env create -f environment.yml 

best, Rohan

murphycj commented 5 years ago

Thanks for the quick reply. I still get an error though. I ran the two lines you provided:

wget https://raw.githubusercontent.com/rraadd88/beditor/master/environment.yml
conda env create -f environment.yml 

I confirmed that the pyensembl version is 1.6 in /Users/charlesmurphy/anaconda3/envs/beditor/lib/python3.6/site-packages/pyensembl/__init__.py

Moreover, my python version is 3.6.5

I then ran beditor --cfg configuration.yaml. My output is now:

start
log file: .log_beditor_2018_10_08_16_41_30_452555_configuration.yaml_None_None_False_False.log
Traceback (most recent call last):
  File "/Users/charlesmurphy/anaconda3/envs/beditor/bin/pyensembl", line 7, in <module>
from pyensembl.shell import run
  File "/Users/charlesmurphy/anaconda3/envs/beditor/lib/python3.6/site-packages/pyensembl/__init__.py", line 20, in <module>
from .ensembl_release import EnsemblRelease, cached_release
  File "/Users/charlesmurphy/anaconda3/envs/beditor/lib/python3.6/site-packages/pyensembl/ensembl_release.py", line 23, in <module>
from .species import check_species_object
  File "/Users/charlesmurphy/anaconda3/envs/beditor/lib/python3.6/site-packages/pyensembl/species.py", line 254, in <module>
Species=collect_all_genomes()
  File "/Users/charlesmurphy/anaconda3/envs/beditor/lib/python3.6/site-packages/pyensembl/species.py", line 232, in collect_all_genomes
releasei=str2num(release) #FIXME is realease is a float
  File "/Users/charlesmurphy/anaconda3/envs/beditor/lib/python3.6/site-packages/pyensembl/species.py", line 202, in str2num
raise ValueError("No digits found in string {}".format(s))        
ValueError: No digits found in string Homo_sapiens.None.cdna.all.fa.gz
bash command error: 1
pyensembl install --reference-name GRCh37 --release 75 --species homo_sapiens
rraadd88 commented 5 years ago

Hello @murphycj , I'm afraid beditor is not compatible with hg19 (GRCh37 assembly). I could install the genome through pyensembl but I couldn't find a GFF3 file that would be needed by beditor. I guess the only alternative would be to use GRCh38 and convert genomic co-ordinates back to GRCh37, that'd do it.

murphycj commented 5 years ago

Actually, I think this is an issue with the PyEnsembl version you are using. Just to be sure I am doing this correctly, I ran the following:

(beditor) MAC190024:181008_bishoyBeditor charlesmurphy$ pip uninstall pyensembl
Uninstalling pyensembl-1.6.0:
  Would remove:
/Users/charlesmurphy/anaconda3/envs/beditor/bin/pyensembl
/Users/charlesmurphy/anaconda3/envs/beditor/lib/python3.6/site-packages/pyensembl-1.6.0.dist-info/*
/Users/charlesmurphy/anaconda3/envs/beditor/lib/python3.6/site-packages/pyensembl/*
Proceed (y/n)? y
  Successfully uninstalled pyensembl-1.6.0

Then I re-installed:

(beditor) MAC190024:181008_bishoyBeditor charlesmurphy$ pip install git+https://github.com/rraadd88/pyensembl
Collecting git+https://github.com/rraadd88/pyensembl
  Cloning https://github.com/rraadd88/pyensembl to /private/var/folders/3_/ps3sj2jd6qq3rskyc37szv_00000gn/T/pip-req-build-e7069on8
Requirement already satisfied: typechecks>=0.0.2 in /Users/charlesmurphy/anaconda3/envs/beditor/lib/python3.6/site-packages (from pyensembl==1.6.0) (0.1.0)
Requirement already satisfied: numpy>=1.7 in /Users/charlesmurphy/anaconda3/envs/beditor/lib/python3.6/site-packages (from pyensembl==1.6.0) (1.13.1)
Requirement already satisfied: pandas>=0.15 in /Users/charlesmurphy/anaconda3/envs/beditor/lib/python3.6/site-packages (from pyensembl==1.6.0) (0.23.3)
Requirement already satisfied: datacache>=1.1.4 in /Users/charlesmurphy/anaconda3/envs/beditor/lib/python3.6/site-packages (from pyensembl==1.6.0) (1.1.4)
Requirement already satisfied: memoized-property>=1.0.2 in /Users/charlesmurphy/anaconda3/envs/beditor/lib/python3.6/site-packages (from pyensembl==1.6.0) (1.0.3)
Requirement already satisfied: six>=1.9.0 in /Users/charlesmurphy/anaconda3/envs/beditor/lib/python3.6/site-packages (from pyensembl==1.6.0) (1.11.0)
Requirement already satisfied: gtfparse>=1.1.0 in /Users/charlesmurphy/anaconda3/envs/beditor/lib/python3.6/site-packages (from pyensembl==1.6.0) (1.1.2)
Requirement already satisfied: serializable in /Users/charlesmurphy/anaconda3/envs/beditor/lib/python3.6/site-packages (from pyensembl==1.6.0) (0.1.1)
Requirement already satisfied: tinytimer in /Users/charlesmurphy/anaconda3/envs/beditor/lib/python3.6/site-packages (from pyensembl==1.6.0) (0.0.0)
Requirement already satisfied: python-dateutil>=2.5.0 in /Users/charlesmurphy/anaconda3/envs/beditor/lib/python3.6/site-packages (from pandas>=0.15->pyensembl==1.6.0) (2.7.3)
Requirement already satisfied: pytz>=2011k in /Users/charlesmurphy/anaconda3/envs/beditor/lib/python3.6/site-packages (from pandas>=0.15->pyensembl==1.6.0) (2018.5)
Requirement already satisfied: appdirs>=1.4.0 in /Users/charlesmurphy/anaconda3/envs/beditor/lib/python3.6/site-packages (from datacache>=1.1.4->pyensembl==1.6.0) (1.4.3)
Requirement already satisfied: mock in /Users/charlesmurphy/anaconda3/envs/beditor/lib/python3.6/site-packages (from datacache>=1.1.4->pyensembl==1.6.0) (2.0.0)
Requirement already satisfied: progressbar33>=2.4 in /Users/charlesmurphy/anaconda3/envs/beditor/lib/python3.6/site-packages (from datacache>=1.1.4->pyensembl==1.6.0) (2.4)
Requirement already satisfied: requests>=2.5.1 in /Users/charlesmurphy/anaconda3/envs/beditor/lib/python3.6/site-packages (from datacache>=1.1.4->pyensembl==1.6.0) (2.19.1)
Requirement already satisfied: simplejson in /Users/charlesmurphy/anaconda3/envs/beditor/lib/python3.6/site-packages (from serializable->pyensembl==1.6.0) (3.16.0)
Requirement already satisfied: pbr>=0.11 in /Users/charlesmurphy/anaconda3/envs/beditor/lib/python3.6/site-packages (from mock->datacache>=1.1.4->pyensembl==1.6.0) (4.3.0)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /Users/charlesmurphy/anaconda3/envs/beditor/lib/python3.6/site-packages (from requests>=2.5.1->datacache>=1.1.4->pyensembl==1.6.0) (3.0.4)
Requirement already satisfied: certifi>=2017.4.17 in /Users/charlesmurphy/anaconda3/envs/beditor/lib/python3.6/site-packages (from requests>=2.5.1->datacache>=1.1.4->pyensembl==1.6.0) (2018.8.24)
Requirement already satisfied: idna<2.8,>=2.5 in /Users/charlesmurphy/anaconda3/envs/beditor/lib/python3.6/site-packages (from requests>=2.5.1->datacache>=1.1.4->pyensembl==1.6.0) (2.7)
Requirement already satisfied: urllib3<1.24,>=1.21.1 in /Users/charlesmurphy/anaconda3/envs/beditor/lib/python3.6/site-packages (from requests>=2.5.1->datacache>=1.1.4->pyensembl==1.6.0) (1.23)
Building wheels for collected packages: pyensembl
  Running setup.py bdist_wheel for pyensembl ... done
  Stored in directory: /private/var/folders/3_/ps3sj2jd6qq3rskyc37szv_00000gn/T/pip-ephem-wheel-cache-x4kyn7ei/wheels/1d/00/2b/5435d66f45a3cf05e2a4fc9ae48def2cddc597b96ea662fefc
Successfully built pyensembl
Installing collected packages: pyensembl
Successfully installed pyensembl-1.6.0

Then I tried running pyensembl -h:

Traceback (most recent call last):
  File "/Users/charlesmurphy/anaconda3/envs/beditor/bin/pyensembl", line 7, in <module>
from pyensembl.shell import run
  File "/Users/charlesmurphy/anaconda3/envs/beditor/lib/python3.6/site-packages/pyensembl/__init__.py", line 20, in <module>
from .ensembl_release import EnsemblRelease, cached_release
  File "/Users/charlesmurphy/anaconda3/envs/beditor/lib/python3.6/site-packages/pyensembl/ensembl_release.py", line 23, in <module>
from .species import check_species_object
  File "/Users/charlesmurphy/anaconda3/envs/beditor/lib/python3.6/site-packages/pyensembl/species.py", line 254, in <module>
Species=collect_all_genomes()
  File "/Users/charlesmurphy/anaconda3/envs/beditor/lib/python3.6/site-packages/pyensembl/species.py", line 232, in collect_all_genomes
releasei=str2num(release) #FIXME is realease is a float
  File "/Users/charlesmurphy/anaconda3/envs/beditor/lib/python3.6/site-packages/pyensembl/species.py", line 202, in str2num
raise ValueError("No digits found in string {}".format(s))        
ValueError: No digits found in string Homo_sapiens.None.cdna.all.fa.gz
rraadd88 commented 5 years ago

Hi again, Well, with my recently committed changes in this repo and my pyensembl fork, you'd be able to install hg19 genome for pyensembl operations. But the issue is that beditor also uses a separate copy of genome and GFF3 annotation file for some operations. I personally could not locate a GFF3 file for hg19 on Ensembl database (strange!), so as I said earlier "I guess the only alternative would be to use GRCh38 and convert genomic co-ordinates back to GRCh37, that'd do it." (unless somebody sends an intelligent pull request out of the blue. :)). Best.