inimah / protoinfomax

Code and Data sets for the EMNLP-2021-Findings Paper "ProtoInfoMax: Prototypical Networks with Mutual Information Maximization for Out-of-Domain Detection"
https://aclanthology.org/2021.findings-emnlp.138
MIT License
4 stars 3 forks source link

Problem with simple_tokenizer #1

Open mxnno opened 2 years ago

mxnno commented 2 years ago

Hi, I want to use your approach, but I want to train (train_protoinfomax_kws_sentiment.sh) I will get the error: No module named 'simple_tokenizer '. There is no simple_tokenizer.py file and I cant find any Tokenizer called 'simple_tokenizer'. Is this module outdated or am I doing something wrong? Thanks!

inimah commented 2 years ago

Thanks for pointing it out! The missing tokenizer file is called from ./basic_utils/vocabulary_cls.py I have just added it.

mxnno commented 2 years ago

Thanks! Another small issue: train_fasttext.py has a SyntaxError in line 466: "Amazondat/"

mxnno commented 2 years ago

wikipedia2vec is missing in requirements.txt

inimah commented 2 years ago

OK. Please let me know other issues as well, which may due to naming conversion, typos, and migration from our server to github repo.

mxnno commented 2 years ago

if I run train_imax_kw_intent.py, I´ll get a FileNotFoundError: _protoinfomax/embeddings/tfidf_sparse_vecintent.pkl is missing. It doesn´t get created anywhere, only _tfidf_sparse_veccls.pkl. Is it the same or how can i get _tfidf_sparse_vecintent.pkl?

inimah commented 2 years ago

In train_imax_kw_intent.py:

image

I have added tfidf_sparse_vec_intent.pkl in ./embeddings/

For sentiment data, this file is created from ./src/extract_sentiment.py For intent data, please adopt the code accordingly by calling the corresponding dataset. (The *.pkl I provided is for intent data because the size is smaller than sentiment domain).

This tfidf_sparse_vec_intent.pkl file is mainly used to extract keywords per sentences.

However,

Torch Data Loader prefers numeric array representation that can be reshaped during batch training episode. That is why we use TfIdf representation when loading ``Kws_xx.train'' via ./workspace/workspace_intent_kw.py.
See that params["tfidf_transformer"] and params["cv"] are called in workspace_intent_kw.

abtuo commented 2 years ago

Hello,

I am also trying to reproduce your work. After taking into account the issues discussed above, I have this new error :

Traceback (most recent call last): File "train_oproto_intent.py", line 18, in <module> from basic_utils.utils_torch_intent import compute_values, get_data, compute_values_eval File "../basic_utils/utils_torch_intent.py", line 5, in <module> from workspace.workspace_intent_rl import workspace ModuleNotFoundError: No module named 'workspace.workspace_intent_rl' It seems that there are missing _rl files in the workspace folder.

Thanks !

abtuo commented 2 years ago

I removed the _rl at the end of the file name and it seems to work. There were several other modules missing such as tensorflow, keras, sklearn, which are not in the requirement.txt file...

I have a new error with a package 'utility' that I haven't yet fixed.

Traceback (most recent call last): File "train_imax_intent.py", line 18, in <module> from basic_utils.utils_torch_intent import compute_values, get_data, compute_values_eval File "../basic_utils/utils_torch_intent.py", line 7, in <module> from utils.cal_methods import HistogramBinning, TemperatureScaling, evaluate, cal_results File "../utils/cal_methods.py", line 17, in <module> from utility.evaluation import ECE, MCE ModuleNotFoundError: No module named 'utility'

inimah commented 2 years ago

Thanks. It might be because python cannot find/read the directory "utility". A little hack is to call the function directly on utils.cal_methods.