yzhangcs / parser

:rocket: State-of-the-art parsers for natural language.
https://parser.yzhang.site/
MIT License
825 stars 138 forks source link

Constituency CRF RoBERTa English not working properly #131

Closed SirBernardPhilip closed 9 months ago

SirBernardPhilip commented 9 months ago

I am trying to run the constituency parser using the pretrained RoBERTa English model but it does not do a good job of parsing sentences. The "con-crf-en" model works as expected but the RoBERTa one fails to properly parse any sentences and outputs something like this every time:

>>> parser.predict(['I', 'saw', 'Sarah', 'with', 'a', 'telescope', '.'], verbose=False, prob=True)[0].pretty_print()
              TOP                   
               |                     
               S                    
  _____________|__________________   
 |   |    |    |    |      |      NP
 |   |    |    |    |      |      |  
 _   _    _    _    _      _      _ 
 |   |    |    |    |      |      |  
 I  saw Sarah with  a  telescope  . 

Here's a notebook link that replicates the issue

yzhangcs commented 9 months ago

@SirBernardPhilip Hi, the previous weights were corrupted somehow. I've uploaded the re-trained model. Please check it out :)

SirBernardPhilip commented 9 months ago

Amazing thank you! I just checked and the new model works for English. Also, the same thing is happening for the con-cr-xlmr model:

>>> from supar import Parser
>>> model_name = "con-crf-xlmr"
>>> parser = Parser.load(model_name)
>>> parser.predict(['Je', 'ai', 'vu', 'Sarah', 'avec', 'un', 'télescope', '.'], verbose=False, prob=True)[0].pretty_print()
             TOP                        
              |                          
              S                         
  ____________|_______________________   
 _   _   _    _    _    _      _      _ 
 |   |   |    |    |    |      |      |  
 Je  ai  vu Sarah avec  un télescope  .