tkzeng / Pangolin

Pangolin is a deep-learning method for predicting splice site strengths.
GNU General Public License v3.0
64 stars 32 forks source link

Pangolin exits with error if variant with wildcard character #26

Open tsivaarumugam opened 2 months ago

tsivaarumugam commented 2 months ago

Hi Team,

While using a VCF file with wildcard character "*" in the allele field Spanning-or-overlapping-deletions-allele, Pangolin exits with error.

Example entry: 3 60539233 . AAAATATAT AATAT,* 550 PASS . . .

Error trace: Traceback (most recent call last): File "pangolin", line 8, in sys.exit(main()) File "python3.8/site-packages/pangolin/pangolin.py", line 250, in main scores = process_variant(lnum+i, str(variant.CHROM), int(variant.POS), variant.REF, str(variant.ALT[0]), gtf, models, args) File "python3.8/site-packages/pangolin/pangolin.py", line 130, in process_variant loss_neg, gain_neg = compute_score(ref_seq, alt_seq, '-', d, models) File "python3.8/site-packages/pangolin/pangolin.py", line 30, in compute_score ref_seq = one_hot_encode(ref_seq, strand).T File "python3.8/site-packages/pangolin/pangolin.py", line 24, in one_hot_encode seq = np.asarray(list(map(int, list(seq[::-1])))) ValueError: invalid literal for int() with base 10: 'M'