jakob-he / TADA

TAD-aware annotation of CNVs
MIT License
7 stars 2 forks source link

Problem with the pathogenicity scores prediction #4

Closed frequena closed 3 years ago

frequena commented 3 years ago

Dear TADA team,

I'm having some problems with the use of the function: "predict variants"

I followed two different approaches:

A) I used data available from the folder "tests" and I ran the code below:

predict_variants -c tests/test_config_pred.yml -o .

I got the following error:

Traceback (most recent call last):
  File "/home/frequena/.conda/envs/py3.6/bin/predict_variants", line 11, in <module>
    load_entry_point('tada==0.2', 'console_scripts', 'predict_variants')()
  File "/home/frequena/.conda/envs/py3.6/lib/python3.6/site-packages/tada-0.2-py3.6.egg/tada/predict_variants.py", line 79, in main
    predict(cfg, output)
  File "/home/frequena/.conda/envs/py3.6/lib/python3.6/site-packages/tada-0.2-py3.6.egg/tada/predict_variants.py", line 53, in predict
    for label, annotated_cnvs in labeled_cnv_dicts:
UnboundLocalError: local variable 'labeled_cnv_dicts' referenced before assignment

B) I created a .bed file with a single CNV:

1 126439621 135430043

....I annotated it correctly and I got a file with the name: Annotated_PATHOGENIC.csv

Next, I ran predict_variants

predict_variants -c config_del_default.yml -o .

...and I got the following error:

  File "/home/frequena/.conda/envs/py3.6/bin/predict_variants", line 11, in <module>
    load_entry_point('tada==0.2', 'console_scripts', 'predict_variants')()
  File "/home/frequena/.conda/envs/py3.6/lib/python3.6/site-packages/tada-0.2-py3.6.egg/tada/predict_variants.py", line 79, in main
    predict(cfg, output)
  File "/home/frequena/.conda/envs/py3.6/lib/python3.6/site-packages/tada-0.2-py3.6.egg/tada/predict_variants.py", line 34, in predict
    cnv_dicts.append(pickle.load(cnv_dict))
_pickle.UnpicklingError: unpickling stack underflow

This is the config file used:

TADS:
  RAW: "data/Dixon_2015_stability_formatted_TADs.bed"
  ANNOTATED: "data/Annotated_Default_TADs.p"

ANNOTATIONS:
  GENES: "data/gnomad_genes_pli_loeuf_HI.bed"
  EXONS: "data/HAVANA_exon.merged.bed.gz"
  ENHANCERS: "data/fantom5_enhancer_phastcon_average.bed"
  CTCF: "data/H1_hESC_CTCF_peaks_idr_optimal.bed"
  DDG2P: "data/DDG2P_genes.bed"
  POINT: "data/extracted_po_pairs.bed"

CNVS:
  RAW:
    NON_PATHOGENIC: "./cnv_test.bed"
    PATHOGENIC: "./cnv_test.bed"
  ANNOTATED:
    NON_PATHOGENIC: "Annotated_PATHOGENIC.csv"
    PATHOGENIC:  "Annotated_PATHOGENIC.csv"

FEATURES: "extended"

CLASSIFIER: "rf"

KWARGS:
  max_depth:  None
  max_features: 'auto'
  min_samples_leaf: 5
  min_samples_split: 4
  n_estimators: 500
  oob_score: True

PRETRAINED_MODEL:

Please let me know if you need any further detail!

Thank you!

jakob-he commented 3 years ago

Dear frequena,

thanks a lot for your interest in our project! I am sorry that I am only just now responding to your issue. There was an error during the loading of pre-annotated data. It should be fixed with the latest commit. If there any other issues please don't hesitate to contact me.