jerryji1993 / DNABERT

DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome
https://doi.org/10.1093/bioinformatics/btab083
Apache License 2.0
565 stars 154 forks source link

No module named 'dateutil' #49

Open ryao-mdanderson opened 3 years ago

ryao-mdanderson commented 3 years ago

Dear author:

I am new to this application. I set up dnabert conda environment for the application on HPC cluster. I followed the instruction step 2.2 Model training to run the test, however, I hit an error message:

Traceback (most recent call last): File "run_pretrain.py", line 42, in from transformers import ( File "/risapps/noarch/dnabert/20210826/src/transformers/init.py", line 22, in from .configuration_albert import ALBERT_PRETRAINED_CONFIG_ARCHIVE_MAP, AlbertConfig File "/risapps/noarch/dnabert/20210826/src/transformers/configuration_albert.py", line 18, in from .configuration_utils import PretrainedConfig File "/risapps/noarch/dnabert/20210826/src/transformers/configuration_utils.py", line 25, in from .file_utils import CONFIG_NAME, cached_path, hf_bucket_url, is_remote_url File "/risapps/noarch/dnabert/20210826/src/transformers/file_utils.py", line 22, in import boto3 File "/risapps/rhel7/python/3.7.3/envs/dnabert/lib/python3.6/site-packages/boto3/init.py", line 16, in from boto3.session import Session File "/risapps/rhel7/python/3.7.3/envs/dnabert/lib/python3.6/site-packages/boto3/session.py", line 17, in import botocore.session File "/risapps/rhel7/python/3.7.3/envs/dnabert/lib/python3.6/site-packages/botocore/session.py", line 29, in import botocore.configloader File "/risapps/rhel7/python/3.7.3/envs/dnabert/lib/python3.6/site-packages/botocore/configloader.py", line 19, in from botocore.compat import six File "/risapps/rhel7/python/3.7.3/envs/dnabert/lib/python3.6/site-packages/botocore/compat.py", line 27, in from dateutil.tz import tzlocal ModuleNotFoundError: No module named 'dateutil'

Did I ignored anything? Thank you for your suggestion to fix this.

Regards,

jerryji1993 commented 3 years ago

Hi @ryao-mdanderson,

Thanks for reporting this issue. Please kindly ensure that you have properly installed all required package dependencies before training the model, as the error message suggests that one or more module is missing.

Let me know if there are additional questions.

Best, Jerry

ryao-mdanderson commented 3 years ago

Hello @jerryji1993 :

I fixed the missing python modules. Thank you very much.

However, I hit a new error message in testing 2.2 Model training as the following. Just FYI, I ran the test in HPC compute node, the node does not have internet access. May you please kindly direct me how to fix missing dna6 module name?

<class 'transformers.tokenization_dna.DNATokenizer'> Traceback (most recent call last): File "run_pretrain.py", line 885, in main() File "run_pretrain.py", line 789, in main tokenizer = tokenizer_class.from_pretrained(args.tokenizer_name, cache_dir=args.cache_dir) File "/risapps/noarch/dnabert/20210826/src/transformers/tokenization_utils.py", line 377, in from_pretrained return cls._from_pretrained(*inputs, **kwargs) File "/risapps/noarch/dnabert/20210826/src/transformers/tokenization_utils.py", line 479, in _from_pretrained list(cls.vocab_files_names.values()), OSError: Model name 'dna6' was not found in tokenizers model name list (dna3, dna4, dna5, dna6). We assumed 'dna6' was a path, a model identifier, or url to a directory containing vocabulary files named ['vocab.txt'] but couldn't find such vocabulary files at this path or url.