kheyer / Genomic-ULMFiT

ULMFiT for Genomic Sequence Data
282 stars 54 forks source link

Getting an NameError: name 'BaseTokenizer' is not defined! #8

Open schlogl2017 opened 3 years ago

schlogl2017 commented 3 years ago

I can't running your script because the utils is giving this error in jupyter notebook! Any tip for make it work?

Thank you

NameError                                 Traceback (most recent call last)
<ipython-input-7-70b698022c71> in <module>
     19     return (df_t, df_v)
     20 
---> 21 class GenomicTokenizer(BaseTokenizer):
     22     def __init__(self, lang='en', ngram=5, stride=2):
     23         self.lang = lang

NameError: name 'BaseTokenizer' is not defined

schlogl2017 commented 3 years ago

I change the imports to: from fastai.text.all import *. And now the error change to:

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-9-70b698022c71> in <module>
     39         pass
     40 
---> 41 class GenomicVocab(Vocab):
     42     def __init__(self, itos):
     43         self.itos = itos

NameError: name 'Vocab' is not defined
sourajyoti-datta commented 3 years ago

I am facing these exact same issues. Could you find any solution? Did it work for you?

schlogl2017 commented 3 years ago

No sir! If i got something I will update you! Be save

schlogl2017 commented 3 years ago

I am facing these exact same issues. Could you find any solution? Did it work for you?

I got this maybe could help

I'm running it without issues in colab, just start your notebook with: !pip3 install fastai==1.0.61 !pip install biopython clone the repo and you ready to go!

also... import sys sys.path.append('path to cloned repo')

sourajyoti-datta commented 3 years ago

Thanks. This issue seems resolved. I hope the creator also adds a requirements.txt file to the repository, that would be complete.

sourajyoti-datta commented 3 years ago

While training the GLM language model, I am getting memory error. My system has 32 gigs of RAM. Are you facing any such issues?

I can first run this: Human Genome LM 0 Data Processing https://github.com/kheyer/Genomic-ULMFiT/blob/master/Mammals/Human/Genomic%20Language%20Models/Human%20Genome%20LM%200%20Data%20Processing.ipynb

But MEMORY ERROR in this: Human Genome LM 5 3-mer Stride 1 Language Model https://github.com/kheyer/Genomic-ULMFiT/blob/master/Mammals/Human/Genomic%20Language%20Models/Human%20Genome%20LM%205%203-mer%20Stride%201%20Language%20Model.ipynb

tzhu-bio commented 1 year ago

I'm having the same problem. Have you guys run it through successfully? If so, can you give me the versions of the various packages?