Error when using --summarize with matcher.py

I have encountered similar issue.

`Downloading: 100%|███████████████████████████| 28.0/28.0 [00:00<00:00, 21.7kB/s] Downloading: 100%|██████████████████████████████| 483/483 [00:00<00:00, 401kB/s] Downloading: 100%|███████████████████████████| 226k/226k [00:00<00:00, 38.4MB/s] Downloading: 100%|███████████████████████████| 455k/455k [00:00<00:00, 40.8MB/s] Downloading: 100%|███████████████████████████| 256M/256M [00:05<00:00, 49.4MB/s] Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_layer_norm.weight', 'vocab_transform.bias', 'vocab_projector.bias', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_transform.weight']

This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Selected optimization level O2: FP16 training with FP32 batchnorm and FP32 master weights.

Defaults for this optimization level are: enabled : True opt_level : O2 cast_model_type : torch.float16 patch_torch_functions : False keep_batchnorm_fp32 : True master_weights : True loss_scale : dynamic Processing user overrides (additional kwargs that are not None)... After processing overrides, optimization options are: enabled : True opt_level : O2 cast_model_type : torch.float16 patch_torch_functions : False keep_batchnorm_fp32 : True master_weights : True loss_scale : dynamic Warning: multi_tensor_applier fused unscale kernel is unavailable, possibly because apex was installed without --cuda_ext --cpp_ext. Using Python fallback. Original ImportError was: ModuleNotFoundError("No module named 'amp_C'",) /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/apex/amp/_initialize.py:25: UserWarning: An input tensor was not cuda. warnings.warn("An input tensor was not cuda.") Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 32768.0 /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/optim/lr_scheduler.py:134: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning) step: 0, loss: 0.7943093180656433 Traceback (most recent call last): File "train_ditto.py", line 92, in run_tag, hp) File "/home/ec2-user/SageMaker/vendor_matching/ditto/ditto_light/ditto.py", line 201, in train train_step(train_iter, model, optimizer, scheduler, hp) File "/home/ec2-user/SageMaker/vendor_matching/ditto/ditto_light/ditto.py", line 123, in train_step for i, batch in enumerate(train_iter): File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 521, in next data = self._next_data() File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 561, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 49, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/ec2-user/SageMaker/vendor_matching/ditto/ditto_light/dataset.py", line 80, in getitem left, right = combined.split(' [SEP] ') ValueError: not enough values to unpack (expected 2, got 1) Traceback (most recent call last): File "matcher.py", line 7, in import jsonlines ModuleNotFoundError: No module named 'jsonlines' `

megagonlabs / ditto

Error when using --summarize with matcher.py #15