programmer290399 commented 2 years ago

Traceback:

RuntimeError                              Traceback (most recent call last)
<ipython-input-5-04fce7cff438> in <module>()
     19 
     20 # Run inference using the instantiated models
---> 21 answers = model.get_answer(context, questions)
     22 
     23 # Print the output

8 frames
/usr/local/lib/python3.7/dist-packages/pyqna/models/reading_comprehension/transformer_models.py in get_answer(self, context, question)
    133             return self._infer_from_model(context, question)
    134         elif isinstance(question, list):
--> 135             return [self._infer_from_model(context, q) for q in question]

/usr/local/lib/python3.7/dist-packages/pyqna/models/reading_comprehension/transformer_models.py in <listcomp>(.0)
    133             return self._infer_from_model(context, question)
    134         elif isinstance(question, list):
--> 135             return [self._infer_from_model(context, q) for q in question]

/usr/local/lib/python3.7/dist-packages/pyqna/models/reading_comprehension/transformer_models.py in _infer_from_model(self, context, question)
     66         ).to(self.device)
     67 
---> 68         outputs = self.model(**inputs)
     69 
     70         non_answer_tokens = [x if x in [0, 1] else 0 for x in inputs.sequence_ids()]

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1100         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1101                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102             return forward_call(*input, **kwargs)
   1103         # Do not call functions when jit is used
   1104         full_backward_hooks, non_full_backward_hooks = [], []

/usr/local/lib/python3.7/dist-packages/transformers/models/distilbert/modeling_distilbert.py in forward(self, input_ids, attention_mask, head_mask, inputs_embeds, start_positions, end_positions, output_attentions, output_hidden_states, return_dict)
    855             output_attentions=output_attentions,
    856             output_hidden_states=output_hidden_states,
--> 857             return_dict=return_dict,
    858         )
    859         hidden_states = distilbert_output[0]  # (bs, max_query_len, dim)

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1100         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1101                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102             return forward_call(*input, **kwargs)
   1103         # Do not call functions when jit is used
   1104         full_backward_hooks, non_full_backward_hooks = [], []

/usr/local/lib/python3.7/dist-packages/transformers/models/distilbert/modeling_distilbert.py in forward(self, input_ids, attention_mask, head_mask, inputs_embeds, output_attentions, output_hidden_states, return_dict)
    548 
    549         if inputs_embeds is None:
--> 550             inputs_embeds = self.embeddings(input_ids)  # (bs, seq_length, dim)
    551         return self.transformer(
    552             x=inputs_embeds,

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1100         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1101                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102             return forward_call(*input, **kwargs)
   1103         # Do not call functions when jit is used
   1104         full_backward_hooks, non_full_backward_hooks = [], []

/usr/local/lib/python3.7/dist-packages/transformers/models/distilbert/modeling_distilbert.py in forward(self, input_ids)
    131         position_embeddings = self.position_embeddings(position_ids)  # (bs, max_seq_length, dim)
    132 
--> 133         embeddings = word_embeddings + position_embeddings  # (bs, max_seq_length, dim)
    134         embeddings = self.LayerNorm(embeddings)  # (bs, max_seq_length, dim)
    135         embeddings = self.dropout(embeddings)  # (bs, max_seq_length, dim)

RuntimeError: The size of tensor a (692) must match the size of tensor b (512) at non-singleton dimension 1

Antsthebul commented 2 years ago

Mind if I give this a shot? What are the steps to reproduce?

programmer290399 commented 2 years ago

That'd be really great @Antsthebul ! Please use this colab notebook for reproducing this error, or follow the following steps:

Create a virtual env and install pyqna in it

 $ python3 -m venv .venv 
 $ . .venv/bin/activate 
 $ pip install pyqna[transformers]

Run the following code snippet in that env.


# Import a specific model
from pyqna.models.reading_comprehension.transformer_models import TransformerQnAModel

Instantiate the model

model = TransformerQnAModel( {"model_name": "distilbert-base-uncased-distilled-squad", "pre_trained": True} )

Take a very long context

context = """ As a scientific endeavor, machine learning grew out of the quest for artificial intelligence. In the early days of AI as an academic discipline, some researchers were interested in having machines learn from data. They attempted to approach the problem with various symbolic methods, as well as what was then termed "neural networks"; these were mostly perceptrons and other models that were later found to be reinventions of the generalized linear models of statistics.[23] Probabilistic reasoning was also employed, especially in automated medical diagnosis.

However, an increasing emphasis on the logical, knowledge-based approach caused a rift between AI and machine learning. Probabilistic systems were plagued by theoretical and practical problems of data acquisition and representation.[24]: 488 By 1980, expert systems had come to dominate AI, and statistics was out of favor. Work on symbolic/knowledge-based learning did continue within AI, leading to inductive logic programming, but the more statistical line of research was now outside the field of AI proper, in pattern recognition and information retrieval.Neural networks research had been abandoned by AI and computer science around the same time. This line, too, was continued outside the AI/CS field, as "connectionism", by researchers from other disciplines including Hopfield, Rumelhart and Hinton. Their main success came in the mid-1980s with the reinvention of backpropagation.

Machine learning (ML), reorganized as a separate field, started to flourish in the 1990s. The field changed its goal from achieving artificial intelligence to tackling solvable problems of a practical nature. It shifted focus away from the symbolic approaches it had inherited from AI, and toward methods and models borrowed from statistics and probability theory.

The difference between ML and AI is frequently misunderstood. ML learns and predicts based on passive observations, whereas AI implies an agent interacting with the environment to learn and take actions that maximize its chance of successfully achieving its goals.

As of 2020, many sources continue to assert that ML remains a subfield of AI. Others have the view that not all ML is part of AI, but only an 'intelligent subset' of ML should be considered AI.

Machine learning and data mining often employ the same methods and overlap significantly, but while machine learning focuses on prediction, based on known properties learned from the training data, data mining focuses on the discovery of (previously) unknown properties in the data (this is the analysis step of knowledge discovery in databases). Data mining uses many machine learning methods, but with different goals; on the other hand, machine learning also employs data mining methods as "unsupervised learning" or as a preprocessing step to improve learner accuracy. Much of the confusion between these two research communities (which do often have separate conferences and separate journals, ECML PKDD being a major exception) comes from the basic assumptions they work with: in machine learning, performance is usually evaluated with respect to the ability to reproduce known knowledge, while in knowledge discovery and data mining (KDD) the key task is the discovery of previously unknown knowledge. Evaluated with respect to known knowledge, an uninformed (unsupervised) method will easily be outperformed by other supervised methods, while in a typical KDD task, supervised methods cannot be used due to the unavailability of training data.

"""

Make a list of your queries

questions = ["What did machine learning emerge out of?", "When did machine learning started flourishing?"]

Run inference using the instantiated models

answers = model.get_answer(context, questions)

Print the output

print(answers)

programmer290399 commented 2 years ago

A good approach to solve this might be to use a sliding window to find the best span, like it is done in the original implementation here
Some other helpful links:
- https://towardsdatascience.com/how-to-apply-transformers-to-any-length-of-text-a5601410af7f
- https://stackoverflow.com/questions/62978957/sliding-window-for-long-text-in-bert-for-question-answering/62996559#62996559
We don't want to use huggingface's pipelines, to have greater control and fine-tuning capabilities etc.

programmer290399 / pyqna

models: rc: transformer models: Unable to process long texts #16

Traceback:

Instantiate the model

Take a very long context

Make a list of your queries

Run inference using the instantiated models

Print the output