Need help understanding the labels of the parser model

Hello! Firstly I have to say that I love this project. Really helping me exploring syntax of different kinds of text. So thank you so much!

I have a question regarding tagsets. I am using swedish model, and few years back, I remember it used to be based on Swedish treebank tagset called Mamba. But it seems like it has been changed in the new version (benepar-sv2).

I tried to print what kind of labels have been used to train the core model, and I got these results.

>>> parser._parser.label_vocab
{'': 0,
 'AP': 1,
 'AP::AP': 2,
 'AP::XP': 3,
 'AVP': 4,
 'AVP::XP': 5,
 'NP': 6,
 'NP::AP': 7,
 'NP::NP': 8,
 'NP::NP::AP': 9,
 'NP::NP::NP::NP::XP': 10,
 'NP::NP::S': 11,
 'NP::NP::VP': 12,
 'NP::PP': 13,
 'NP::S': 14,
 'NP::XP': 15,
 'NP::XP::NP': 16,
 'NP::XP::S': 17,
 'PP': 18,
 'PP::AVP': 19,
 'PP::AVP::XP': 20,
 'PP::NP': 21,
 'PP::XP': 22,
 'PSEUDO': 23,
 'S': 24,
 'S::AVP': 25,
 'S::NP': 26,
 'S::NP::NP': 27,
 'S::NP::NP::NP::NP': 28,
 'S::NP::S': 29,
 'S::NP::XP': 30,
 'S::NP::XP::S': 31,
 'S::PP': 32,
 'S::PP::NP': 33,
 'S::S': 34,
 'S::S::NP': 35,
 'S::S::NP::NP': 36,
 'S::VP': 37,
 'S::XP': 38,
 'VP': 39,
 'VP::AP': 40,
 'VP::PP': 41,
 'VP::S': 42,
 'VP::VP': 43,
 'VP::XP': 44,
 'XP': 45,
 'XP::AVP': 46,
 'XP::NP': 47,
 'XP::PP': 48,
 'XP::S': 49}
>>> parser._parser.tag_vocab
{'AB': 1,
 'DT': 2,
 'HA': 3,
 'HD': 4,
 'HP': 5,
 'HS': 6,
 'IE': 7,
 'IN': 8,
 'JJ': 9,
 'KN': 10,
 'MAD': 11,
 'MID': 12,
 'NN': 13,
 'P': 14,
 'PAD': 15,
 'PC': 16,
 'PL': 17,
 'PM': 18,
 'PN': 19,
 'PS': 20,
 'RG': 21,
 'RO': 22,
 'SN': 23,
 'UNK': 0,
 'UO': 24,
 'VB': 25}

What is the difference between NP::NP::S and S::NP::NP ?

In this example ( In English: Hello, I am a banana) There is a S (simple declarative clause) which has 2 NPs as children. Would this be NP::NP::S or S::NP::NP ? And what is happening with AUX? Because, for me it is hard to think about any structure where S has only 2 NPs. Because at least one VP is required to become a S.

Also, general question: I saw from #30 that you are using this for training: http://surdeanu.cs.arizona.edu//mihai/teaching/ista555-fall13/readings/PennTreebankConstituents.html Is it same for Swedish model and other language's models? For example unlike English model, I see there is no FRAG in labels for Swedish models. Is this because of the nature of the language itself? Or did you use different label set for different languages?

nikitakit / self-attentive-parser

Need help understanding the labels of the parser model #104