jerinphilip / ilmulti

Tooling to play around with multilingual machine translation for Indian Languages.
http://preon.iiit.ac.in/~jerin/bhasha
MIT License
21 stars 4 forks source link

Sample Colab Notebook seems to have a bug #5

Closed rahulraj80 closed 4 years ago

rahulraj80 commented 4 years ago

Hi Jerin,

Is the first cell needed for anything other than storage/ vscode linkage?

If not, the sample Colab Notebook seems to have some bug. Have a look at this when you get the time : https://colab.research.google.com/gist/rahulraj80/2f45c7ab1b44c616b12917f5211c51d3/ilmulti-sample-run-notebook.ipynb

It complaints:

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
/content/ilmulti/ilmulti/translator/translator.py in <module>()
      5 try:
----> 6     import fairseq
      7     import torch

ModuleNotFoundError: No module named 'fairseq'

During handling of the above exception, another exception occurred:

NameError                                 Traceback (most recent call last)
2 frames
<ipython-input-8-e2b51881e661> in <module>()
----> 1 from ilmulti.translator import from_pretrained
      2 
      3 translator = from_pretrained(tag='mm-all-iter0')
      4 
      5 sample = translator("The quick brown fox jumps over the lazy dog", tgt_lang='hi')

/content/ilmulti/ilmulti/translator/__init__.py in <module>()
      1 
----> 2 from .translator import FairseqTranslator
      3 from .mt_engine import MTEngine
      4 from .pretrained import from_pretrained, mm_all

/content/ilmulti/ilmulti/translator/translator.py in <module>()
      9     from fairseq import data, options, tasks, tokenizer, utils
     10 except ImportError:
---> 11     warnings.warn(
     12     """
     13     Please check if you have installed specified versions of torch,

NameError: name 'warnings' is not defined

The last torch version error seems to be erroneous as a few cells up, it said:

Requirement already satisfied: torch==1.0.0 in /usr/local/lib/python3.6/dist-packages (1.0.0)
Requirement already satisfied: torchvision==0.2.1 in /usr/local/lib/python3.6/dist-packages (0.2.1)
Requirement already satisfied: pillow>=4.1.1 in /usr/local/lib/python3.6/dist-packages (from torchvision==0.2.1) (7.0.0)
Requirement already satisfied: numpy in /usr/local/lib/python3.6/dist-packages (from torchvision==0.2.1) (1.16.0)
Requirement already satisfied: six in /usr/local/lib/python3.6/dist-packages (from torchvision==0.2.1) (1.15.0)

Cheers, Rahul

jerinphilip commented 4 years ago

@rahulraj80 See if it's fixed now? I had modified the source so that someone (@Nimishasri) without torch/fairseq on a lesser system. can run the library locally. So the bug is possibly my bad, sorry. I should be testing this to avoid unnecessary bugs among a lot of other things.

  1. The warnings ImportError should be resolved by now. I have correctly imported warnings
  2. Alternatively you can simply install torch==1.0.0 and fairseq==0.8.0, which will prevent the except block from triggering, which will fix the case anyway. Your logs don't indicate the fairseq requirement to be satisfied?

Maybe @Nimishasri can help you with the Colab notebook, I shared it as she supplied one. I don't work with Colab Notebooks much.

rahulraj80 commented 4 years ago

Thanks Jerin. Will try out and revert.

Will be happy to help on the Colab side if it helps you focus on the more important stuff.

rahulraj80 commented 4 years ago

@jerinphilip : So the earlier issues seem to have sorted out. It is currently getting stuck at: Runnable Gist : Just press Ctrl-F9 (or Runtime->Run All) after opening the link to replicate.

| [src] dictionary: 40897 types
| [tgt] dictionary: 40897 types
/content/ilmulti/ilmulti/translator/translator.py:37: UserWarning: utils.load_ensemble_for_inference is deprecated. Please use checkpoint_utils.load_model_ensemble instead.
  self.models, model_args = fairseq.utils.load_ensemble_for_inference(model_paths, self.task, model_arg_overrides=eval(args.model_overrides))
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-12-e2b51881e661> in <module>()
      1 from ilmulti.translator import from_pretrained
      2 
----> 3 translator = from_pretrained(tag='mm-all-iter0')
      4 
      5 sample = translator("The quick brown fox jumps over the lazy dog", tgt_lang='hi')

7 frames
/content/ilmulti/ilmulti/translator/pretrained.py in from_pretrained(tag, use_cuda)
     60     from .mt_engine import MTEngine
     61 
---> 62     translator = build_translator(config['model'], use_cuda=use_cuda)
     63     segmenter = build_segmenter(config['segmenter'])
     64     tokenizer = build_tokenizer(config['tokenizer'])

/content/ilmulti/ilmulti/translator/translator.py in build_translator(model, use_cuda)
    169     args.enhance(**keyword_arguments)
    170 
--> 171     fseq_translator = FairseqTranslator(args, use_cuda=use_cuda)
    172     return fseq_translator
    173 

/content/ilmulti/ilmulti/translator/translator.py in __init__(self, args, use_cuda)
     35         # print('| loading model(s) from {}'.format(args.path))
     36         model_paths = args.path.split(':')
---> 37         self.models, model_args = fairseq.utils.load_ensemble_for_inference(model_paths, self.task, model_arg_overrides=eval(args.model_overrides))
     38         self.tgt_dict = self.task.target_dictionary
     39 

/usr/local/lib/python3.6/dist-packages/fairseq/utils.py in load_ensemble_for_inference(filenames, task, model_arg_overrides)
     27     )
     28     return checkpoint_utils.load_model_ensemble(
---> 29         filenames, arg_overrides=model_arg_overrides, task=task,
     30     )
     31 

/usr/local/lib/python3.6/dist-packages/fairseq/checkpoint_utils.py in load_model_ensemble(filenames, arg_overrides, task)
    153         task (fairseq.tasks.FairseqTask, optional): task to use for loading
    154     """
--> 155     ensemble, args, _task = load_model_ensemble_and_task(filenames, arg_overrides, task)
    156     return ensemble, args
    157 

/usr/local/lib/python3.6/dist-packages/fairseq/checkpoint_utils.py in load_model_ensemble_and_task(filenames, arg_overrides, task)
    164         if not os.path.exists(filename):
    165             raise IOError('Model file not found: {}'.format(filename))
--> 166         state = load_checkpoint_to_cpu(filename, arg_overrides)
    167 
    168         args = state['args']

/usr/local/lib/python3.6/dist-packages/fairseq/checkpoint_utils.py in load_checkpoint_to_cpu(path, arg_overrides)
    140         for arg_name, arg_val in arg_overrides.items():
    141             setattr(args, arg_name, arg_val)
--> 142     state = _upgrade_state_dict(state)
    143     return state
    144 

/usr/local/lib/python3.6/dist-packages/fairseq/checkpoint_utils.py in _upgrade_state_dict(state)
    302 
    303     # set any missing default values in the task, model or other registries
--> 304     registry.set_defaults(state['args'], tasks.TASK_REGISTRY[state['args'].task])
    305     registry.set_defaults(state['args'], models.ARCH_MODEL_REGISTRY[state['args'].arch])
    306     for registry_name, REGISTRY in registry.REGISTRIES.items():

KeyError: 'shared-multilingual-translation'

If you see the gist above, all installations went through without issues.

jerinphilip commented 4 years ago
  1. If you want this only for inference, edit the checkpoint to change the task to 'translation'. I overrode tasks to easily plug unplug datasets and tokenizer nothing else, this is an artifact coming out of the same. This will allow you to use fairseq-generate as well, which is batched properly and much faster. You just have to use the tokenizer for preprocessing and dictionaries here with 'translation' for models to be compatible.

  2. If you want training as well as inference on the datasets used in this work, install fairseq-ilmt instead of fairseq==0.8.0. There is some hardcode which @shashanksiripragada did which will have to be written around to ensure compatibility.

Sorry this is complicated, we customized the fork to make certain things feasible without maintaining backwards compatibility etc. @Nimishasri seems to have gotten this (example) to work without issues, the colab using fairseq-ilmt. So I'd say it's worth a try to replace with 0.8.0 with fairseq-ilmt instead. I've corresponded with someone who had the same issue as you and instead chose to go with 1 for faster throughput.

rahulraj80 commented 4 years ago

Makes sense.

I changed over to fairseq-ilmt, but the exact same bug reappears. I also tried checking out a previous version but I am getting the same error. (Cuda version is 10.0, and pytorch==1.0.0) Sharing the output I have of the current environment below.

Maybe the environment is playing the spoilsport. To debug, can you share the cuda version on your test setup and the output of pip freeze for your environment for which the minimal startup script of the README is working.

Let's say, starting from a fresh install of Ubuntu 20.04 with a supported Cuda card, what all does one need to do to get the first output if the ILMulti translator.

If we can get a fresh-install-to-output steps nailed down, I am sure we will be able to debug the failure and have a working Notebook for anyone who uses the repository.

Also @Nimishasri : Can you share if we are going about this all wrong? Does the notebook still work for you? What is the environment you have?

Output:

$ pip freeze --all 

absl-py==0.10.0
alabaster==0.7.12
albumentations==0.1.12
altair==4.1.0
argon2-cffi==20.1.0
asgiref==3.2.10
astor==0.8.1
astropy==4.0.1.post1
astunparse==1.6.3
async-generator==1.10
atari-py==0.2.6
atomicwrites==1.4.0
attrs==20.2.0
audioread==2.1.8
autograd==1.3
Babel==2.8.0
backcall==0.2.0
beautifulsoup4==4.6.3
bleach==3.1.5
blis==0.4.1
bokeh==2.1.1
boto==2.49.0
boto3==1.14.59
botocore==1.17.59
Bottleneck==1.3.2
branca==0.4.1
bs4==0.0.1
CacheControl==0.12.6
cachetools==4.1.1
catalogue==1.0.0
certifi==2020.6.20
cffi==1.14.2
chainer==7.4.0
chardet==3.0.4
click==7.1.2
cloudpickle==1.3.0
cmake==3.12.0
cmdstanpy==0.9.5
colorlover==0.3.0
community==1.0.0b1
contextlib2==0.5.5
convertdate==2.2.2
coverage==3.7.1
coveralls==0.5
crcmod==1.7
cufflinks==0.17.3
cupy-cuda101==7.4.0
cvxopt==1.2.5
cvxpy==1.0.31
cycler==0.10.0
cymem==2.0.3
Cython==0.29.21
daft==0.0.4
dask==2.12.0
dataclasses==0.7
datascience==0.10.6
debugpy==1.0.0rc2
decorator==4.4.2
defusedxml==0.6.0
descartes==1.1.0
dill==0.3.2
distributed==1.25.3
Django==3.1.1
dlib==19.18.0
dm-tree==0.1.5
docopt==0.6.2
docutils==0.15.2
dopamine-rl==1.0.5
earthengine-api==0.1.234
easydict==1.9
ecos==2.0.7.post1
editdistance==0.5.3
en-core-web-sm==2.2.5
entrypoints==0.3
ephem==3.7.7.1
et-xmlfile==1.0.1
fa2==0.3.5
-e git+https://github.com/rahulraj80/fairseq-ilmt.git@42a628b59b3e37431e6d4de79313fe6107873e87#egg=fairseq
fancyimpute==0.4.3
fastai==1.0.61
fastBPE==0.1.0
fastdtw==0.3.4
fastprogress==1.0.0
fastrlock==0.5
fbprophet==0.7.1
feather-format==0.4.1
filelock==3.0.12
firebase-admin==4.1.0
fix-yahoo-finance==0.0.22
Flask==1.1.2
folium==0.8.3
future==0.16.0
gast==0.3.3
GDAL==2.2.2
gdown==3.6.4
gensim==3.6.0
geographiclib==1.50
geopy==1.17.0
gin-config==0.3.0
glob2==0.7
google==2.0.3
google-api-core==1.16.0
google-api-python-client==1.7.12
google-auth==1.17.2
google-auth-httplib2==0.0.4
google-auth-oauthlib==0.4.1
google-cloud-bigquery==1.21.0
google-cloud-core==1.0.3
google-cloud-datastore==1.8.0
google-cloud-firestore==1.7.0
google-cloud-language==1.2.0
google-cloud-storage==1.18.1
google-cloud-translate==1.5.0
google-colab==1.0.0
google-pasta==0.2.0
google-resumable-media==0.4.1
googleapis-common-protos==1.52.0
googledrivedownloader==0.4
graphviz==0.10.1
grpcio==1.32.0
gspread==3.0.1
gspread-dataframe==3.0.8
gym==0.17.2
h5py==2.10.0
HeapDict==1.0.1
holidays==0.10.3
holoviews==1.13.3
html5lib==1.0.1
httpimport==0.5.18
httplib2==0.17.4
httplib2shim==0.0.3
humanize==0.5.1
hyperopt==0.1.2
ideep4py==2.0.0.post3
idna==2.10
ilmulti==0.0.1
image==1.5.32
imageio==2.4.1
imagesize==1.2.0
imbalanced-learn==0.4.3
imblearn==0.0
imgaug==0.2.9
importlib-metadata==1.7.0
imutils==0.5.3
inflect==2.1.0
iniconfig==1.0.1
intel-openmp==2020.0.133
intervaltree==2.1.0
ipykernel==4.10.1
ipython==5.5.0
ipython-genutils==0.2.0
ipython-sql==0.3.9
ipywidgets==7.5.1
itsdangerous==1.1.0
jax==0.1.75
jaxlib==0.1.52
jdcal==1.4.1
jedi==0.17.2
jieba==0.42.1
Jinja2==2.11.2
jmespath==0.10.0
joblib==0.16.0
jpeg4py==0.1.4
jsonschema==2.6.0
jupyter==1.0.0
jupyter-client==5.3.5
jupyter-console==5.2.0
jupyter-core==4.6.3
jupyterlab-pygments==0.1.1
kaggle==1.5.8
kapre==0.1.3.1
Keras==2.4.3
Keras-Preprocessing==1.1.2
keras-vis==0.4.1
kiwisolver==1.2.0
knnimpute==0.1.0
korean-lunar-calendar==0.2.1
langdetect==1.0.8
langid==1.1.6
librosa==0.6.3
lightgbm==2.2.3
llvmlite==0.31.0
lmdb==0.99
lucid==0.3.8
LunarCalendar==0.0.9
lxml==4.2.6
Markdown==3.2.2
MarkupSafe==1.1.1
matplotlib==3.2.2
matplotlib-venn==0.11.5
missingno==0.4.2
mistune==0.8.4
mizani==0.6.0
mkl==2019.0
mlxtend==0.14.0
more-itertools==8.5.0
moviepy==0.2.3.5
mpmath==1.1.0
msgpack==1.0.0
multiprocess==0.70.10
multitasking==0.0.9
murmurhash==1.0.2
music21==5.5.0
natsort==5.5.0
nbclient==0.5.0
nbconvert==5.6.1
nbformat==5.0.7
nest-asyncio==1.4.0
networkx==2.5
nibabel==3.0.2
nltk==3.2.5
notebook==5.3.1
np-utils==0.5.12.1
numba==0.48.0
numexpr==2.7.1
numpy==1.18.5
nvidia-ml-py3==7.352.0
oauth2client==4.1.3
oauthlib==3.1.0
okgrade==0.4.3
opencv-contrib-python==4.1.2.30
opencv-python==4.1.2.30
openpyxl==2.5.9
opt-einsum==3.3.0
osqp==0.6.1
packaging==20.4
palettable==3.3.0
pandas==1.0.5
pandas-datareader==0.8.1
pandas-gbq==0.11.0
pandas-profiling==1.4.1
pandocfilters==1.4.2
panel==0.9.7
param==1.9.3
parso==0.7.1
pathlib==1.0.1
patsy==0.5.1
pexpect==4.8.0
pickleshare==0.7.5
Pillow==7.0.0
pip==19.3.1
pip-tools==4.5.1
plac==1.1.3
plotly==4.4.1
plotnine==0.6.0
pluggy==0.7.1
portalocker==2.0.0
portpicker==1.3.1
prefetch-generator==1.0.1
preshed==3.0.2
prettytable==0.7.2
progressbar2==3.38.0
prometheus-client==0.8.0
promise==2.3
prompt-toolkit==1.0.18
protobuf==3.12.4
psutil==5.4.8
psycopg2==2.7.6.1
ptyprocess==0.6.0
py==1.9.0
pyarrow==0.14.1
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycocotools==2.0.2
pycparser==2.20
pyct==0.4.7
pydata-google-auth==1.1.0
pydot==1.3.0
pydot-ng==2.0.0
pydotplus==2.0.2
PyDrive==1.3.1
pyemd==0.5.1
pyglet==1.5.0
Pygments==2.6.1
pygobject==3.26.1
pymc3==3.7
PyMeeus==0.3.7
pymongo==3.11.0
pymystem3==0.2.0
PyOpenGL==3.1.5
pyparsing==2.4.7
pyrsistent==0.16.0
pysndfile==1.3.8
PySocks==1.7.1
pystan==2.19.1.1
pytest==3.6.4
python-apt==1.6.5+ubuntu0.3
python-chess==0.23.11
python-dateutil==2.8.1
python-louvain==0.14
python-slugify==4.0.1
python-utils==2.4.0
pytz==2018.9
pyviz-comms==0.7.6
PyWavelets==1.1.1
PyYAML==3.13
pyzmq==19.0.2
qtconsole==4.7.7
QtPy==1.9.0
regex==2019.12.20
requests==2.23.0
requests-oauthlib==1.3.0
resampy==0.2.2
retrying==1.3.3
rpy2==3.2.7
rsa==4.6
s3transfer==0.3.3
sacrebleu==1.4.14
scikit-image==0.16.2
scikit-learn==0.22.2.post1
scipy==1.4.1
scs==2.1.2
seaborn==0.10.1
Send2Trash==1.5.0
sentencepiece==0.1.91
setuptools==50.3.0
setuptools-git==1.2
Shapely==1.7.1
simplegeneric==0.8.1
six==1.15.0
sklearn==0.0
sklearn-pandas==1.8.0
slugify==0.0.1
smart-open==2.1.1
snowballstemmer==2.0.0
sortedcontainers==2.2.2
spacy==2.2.4
Sphinx==1.8.5
sphinxcontrib-serializinghtml==1.1.4
sphinxcontrib-websupport==1.2.4
SQLAlchemy==1.3.19
sqlparse==0.3.1
srsly==1.0.2
statsmodels==0.10.2
sympy==1.1.1
tables==3.4.4
tabulate==0.8.7
tblib==1.7.0
tensorboard==2.3.0
tensorboard-plugin-wit==1.7.0
tensorboardcolab==0.0.22
tensorflow==2.3.0
tensorflow-addons==0.8.3
tensorflow-datasets==2.1.0
tensorflow-estimator==2.3.0
tensorflow-gcs-config==2.3.0
tensorflow-hub==0.9.0
tensorflow-metadata==0.24.0
tensorflow-privacy==0.2.2
tensorflow-probability==0.11.0
termcolor==1.1.0
terminado==0.8.3
testpath==0.4.4
text-unidecode==1.3
textblob==0.15.3
textgenrnn==1.4.1
Theano==1.0.5
thinc==7.4.0
tifffile==2020.9.3
toml==0.10.1
toolz==0.10.0
torch==1.0.0
torchsummary==1.5.1
torchtext==0.3.1
torchvision==0.3.0
tornado==5.1.1
tqdm==4.41.1
traitlets==4.3.3
tweepy==3.6.0
typeguard==2.7.1
typing-extensions==3.7.4.3
tzlocal==1.5.1
umap-learn==0.4.6
uritemplate==3.0.1
urllib3==1.24.3
vega-datasets==0.8.0
wasabi==0.8.0
wcwidth==0.2.5
webencodings==0.5.1
Werkzeug==1.0.1
wheel==0.35.1
widgetsnbextension==3.5.1
wordcloud==1.5.0
wrapt==1.12.1
xarray==0.15.1
xgboost==0.90
xlrd==1.1.0
xlwt==1.3.0
yellowbrick==0.9.1
zict==2.0.0
zipp==3.1.0
jerinphilip commented 4 years ago

@rahulraj80 The same error can't appear with fairseq-ilmt, as 'shared-multilingual-translation' is a defined task? Can you check once again?

rahulraj80 commented 4 years ago

Sorry - My bad. The error is very different from earlier - I missed attaching that in the last message.

I should have said that same error message (different from the earlier one) comes in spite of any change that I do to the configuration of the environment and torch/cuda versions. cuda=False gives the same (following) error as well.

When translator attempts to _makebatches, it tries to iterate over the itr object, which calls the getitem _ of langauage_pair_dataset, but it fails as self.src is a list of Tensors instead of a ConcatDataset (??) that it probably needs to be.

The exact error log is (please ignore the autolog/print statements):

| [src] dictionary: 40897 types
| [tgt] dictionary: 40897 types
./ilmulti/translator/translator.py:38: UserWarning: utils.load_ensemble_for_inference is deprecated. Please use checkpoint_utils.load_model_ensemble instead.
  self.models, model_args = fairseq.utils.load_ensemble_for_inference(model_paths, self.task, model_arg_overrides=eval(args.model_overrides))
./ilmulti/utils/language_utils.py:47: UserWarning: Detect segmented is not recommended.This might lead to large slowdowns.
  "Detect segmented is not recommended."
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-28-82060a75cda2> in <module>()
---> 36                 result = translator("நாளைக்கு என்ன எக்ஸாம். நாளைக்கு என்ன எக்ஸாம்.", tgt_lang="en")

9 frames
/content/ilmulti/ilmulti/translator/mt_engine.py in __call__(self, source, tgt_lang, src_lang, detokenize)
     47 
     48         autolog(f":Sources::T:{str(type(sources))}:len:{len(sources)}:::Sources::T:{sources}:::")
---> 49         export = self.translator(sources)
     50         export = self._handle_empty_lines_noise(export)
     51         if detokenize:

/content/ilmulti/ilmulti/translator/translator.py in __call__(self, lines, attention)
     69         autolog(f":src_dict:T:{str(type(src_dict))}:L::tgt:T:{str(type(tgt_dict))}:L::Align:T:{str(type(align_dict))}:L::")
     70         autolog(f"::Lines:T:{str(type(lines))}:L:{len(lines)}:lines:{lines}:")
---> 71         for batch, idx in self._make_batches(lines):
     72             src_tokens = batch.src_tokens
     73             src_lengths = batch.src_lengths

/content/ilmulti/ilmulti/translator/translator.py in _make_batches(self, lines)
    145         autolog(f"lengths:T:{str(type(lengths))}:L::{len(lengths)}::_:{lengths}:")
    146         autolog(f"itr:T:{str(type(itr))}:L::{len(itr)}::_:{itr}:")
--> 147         for batch in itr:
    148             yield Batch(
    149                 ids=batch['id'],

/content/fairseq-ilmt/fairseq/data/iterators.py in __iter__(self)
     33 
     34     def __iter__(self):
---> 35         for x in self.iterable:
     36             self.count += 1
     37             yield x

/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py in __next__(self)
    361 
    362     def __next__(self):
--> 363         data = self._next_data()
    364         self._num_yielded += 1
    365         if self._dataset_kind == _DatasetKind.Iterable and \

/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py in _next_data(self)
    401     def _next_data(self):
    402         index = self._next_index()  # may raise StopIteration
--> 403         data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
    404         if self._pin_memory:
    405             data = _utils.pin_memory.pin_memory(data)

/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py in fetch(self, possibly_batched_index)
     42     def fetch(self, possibly_batched_index):
     43         if self.auto_collation:
---> 44             data = [self.dataset[idx] for idx in possibly_batched_index]
     45         else:
     46             data = self.dataset[possibly_batched_index]

/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py in <listcomp>(.0)
     42     def fetch(self, possibly_batched_index):
     43         if self.auto_collation:
---> 44             data = [self.dataset[idx] for idx in possibly_batched_index]
     45         else:
     46             data = self.dataset[possibly_batched_index]

/content/fairseq-ilmt/fairseq/data/language_pair_dataset.py in __getitem__(self, index)
    135         print(">"*3, f"calling get corpus for Self:{self}:t:{str(type(self))}:")      ####Self[0]:{self.src[0]}:t:{str(type(self.src[0]))}:Self[0][0]:{self.src[0][0]}:t:{str(type(self.src[0][0]))}:::{index}:::")
    136         print(">"*3, f" Self src:{self.src}:t:{str(type(self.src))}:Self[0]:{self.src[0]}:t:{str(type(self.src[0]))}:Self[0][0]:{self.src[0][0]}:t:{str(type(self.src[0][0]))}:::{index}:::")
    137         
--> 138         src_id = self.src.get_corpus_id(index)
    139         tgt_id = self.tgt.get_corpus_id(index)

AttributeError: 'list' object has no attribute 'get_corpus_id'

The line numbers have shifted down a bit as I added a few logging (autolog) and print statements to debug the data type issue.

The self.src object at this stage is a list of Tensors (output of the two print statements above in /content/fairseq-ilmt/fairseq/data/language_pair_dataset.py in getitem(self, index)):

>>> calling get corpus for Self:<fairseq.data.language_pair_dataset.LanguagePairDataset object at 0x7f71aed93630>:t:<class 'fairseq.data.language_pair_dataset.LanguagePairDataset'>:
>>>  Self src:[tensor([  262, 35292,  6460,  5802, 34390, 34356,  5775,  6554,  6299,     2])]:t:<class 'list'>:Self[0]:tensor([  262, 35292,  6460,  5802, 34390, 34356,  5775,  6554,  6299,     2]):t:<class 'torch.Tensor'>:Self[0][0]:262:t:<class 'torch.Tensor'>:::0:::

Do let me know if I can share something else.

jerinphilip commented 4 years ago

So, some breaking code was added a month back, which I don't think will be modded soon. Can you try the following tag, so your life becomes easier than to edit this code?

  1. https://github.com/jerinphilip/fairseq-ilmt/tree/lrec-2020
  2. https://coderwall.com/p/-wbo5q/pip-install-a-specific-github-repo-tag-or-branch
  3. https://stackoverflow.com/questions/13685920/install-specific-git-commit-with-pip
rahulraj80 commented 4 years ago

Thanks @jerinphilip 👍 : That solves it. I used the June versions for both ilmulti and fairseq-ilmt as the latest ilmulti did not play well with the older fairseq-ilmt.

Confirming that it works Working Colab Notebook : Press Ctrl-F9 or Runtime-> Run All

With a small patch in fairseq-ilmt/fairseq/search.py, it works with the latest pytorch versions without the need to downgrade torch on Colab.

YMMV for other integrations, but IMHO replacing torch.div with torch.floor_divide should not break any applications.

In any case, people should first follow the suggested versions as per the README.md before trying these stunts.

jerinphilip commented 4 years ago

Let's hope this is future proof.

https://colab.research.google.com/drive/1KOvjawhzPXOQ6RLlFBFeInkuuR0QAWTK?usp=sharing