Open sangeethsn opened 3 years ago
There's no specific mechanism in Stanza to put the data in a dataframe, but you can access the individual words and do whatever you want with them, including putting them in a dataframe:
import stanza
pipe = stanza.Pipeline("en", processors="tokenize,pos,ner")
doc = pipe("Ragavan punched Teferi in his face")
print(doc)
for sentence in doc.sentences:
for token in sentence.tokens:
for word in token.words:
# instead of printing, could put the rows in a dataframe
print(word.id, word.text, word.upos, word.xpos, token.ner)
Thank you so much
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Are you able to apply a stanza pipline to a pandas dataframe and convert the outputs to other columns ?
What would you want that function to do?
One issue to consider is that the columns present are determined by which processors are used, which won't always be the same for different users or different language's pipelines.
I have a dataframe with a column that has keywords that I am trying to tag (POS) using Stanza and then return the output of each row in a new column with the tags attached. Then pull the nouns and adjectives from the new pos column
There's already a bit of a question here - are you looking to tag individual words, or are the words in a sentence? The tool doesn't handle individual words; what happens for "bush", for example, or "dog", "cut", etc
On Thu, Jul 14, 2022 at 10:09 AM Mohamad Quteifan @.***> wrote:
I have a dataframe with a column that has keywords that I am trying to tag (POS) using Stanza and then return the output of each row in a new column with the tags attached. Then pull the nouns and adjectives from the new pos column
— Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/stanza/issues/825#issuecomment-1184684820, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWIM6623AU7EG5JLBKDVUBCTVANCNFSM5FTDQICA . You are receiving this because you commented.Message ID: @.***>
Okay, gotcha. It's individual words so that is where my problem lies
Indeed, that will not work with our model. I do not think such a thing exists, to be honest. For example, in the sentence "that will not work with our model", it is clear "model" is a noun, but if I wrote "how do you model heat loss in a cup of tea", "model" is now a verb.
On Thu, Jul 14, 2022 at 1:14 PM Mohamad Quteifan @.***> wrote:
Okay, gotcha. It's individual words so that is where my problem lies
— Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/stanza/issues/825#issuecomment-1184855524, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWKQ5ND5LA5AZIJGGPTVUBYKXANCNFSM5FTDQICA . You are receiving this because you commented.Message ID: @.***>
Oh yes ofcourse, I understand that. I am just using it to mark the words. I was using NLTK and it was doing a terrible job at identifying the terms but I found a different model that effectively predicts the words POS. (I am using the SennaTagger).
Hi, I got this following output from NER process. I want this in the form of a dataframe .In that case,"id,text,upos,xpos,ner" shoud be column names.Is that possible to convert into dataframe?
[ [ { "id": 1, "text": "[", "upos": "PUNCT", "xpos": "-LRB-", "start_char": 0, "end_char": 1, "ner": "O" }, { "id": 2, "text": "'", "upos": "PUNCT", "xpos": "''", "start_char": 1, "end_char": 2, "ner": "O" } ], [ { "id": 1, "text": "OLD", "upos": "ADJ", "xpos": "NNP", "feats": "Degree=Pos", "start_char": 2, "end_char": 5, "ner": "B-FAC" }, { "id": 2, "text": "COAST", "upos": "PROPN", "xpos": "NNP", "feats": "Number=Sing", "start_char": 6, "end_char": 11, "ner": "I-FAC" }, { "id": 3, "text": "BRIDGE", "upos": "PROPN", "xpos": "NNP", "feats": "Number=Sing", "start_char": 12, "end_char": 18, "ner": "I-FAC" }, { "id": 4, "text": "1", "upos": "NUM", "xpos": "CD", "feats": "NumForm=Digit|NumType=Card", "start_char": 19, "end_char": 20, "ner": "E-FAC" },