Open francolq opened 4 years ago
Can confirm. Unfortunately, the explanation here is that those words never show up as a pair in the default training data. The only realistic option would be to retrain with some supplemental training data.
FWIW I have never heard or used the phrase "liquid can"
Thanks for the quick answer! the examples are from a dataset.
Unfortunately, not the dataset we used to train the POS.
thanks @AngledLuffa ! I was just pointing where I got the "liquid can" from, the dataset I have is not even tagged.
Stanza is great!!
This issue will likely be fixed in a future release where we create an English pipeline by pooling several big treebanks together (which hopefully can cover more cases like these). However we cannot make a promise on when that will happen. Having more reliable models are always on our TODO list.
I have a similar issue in Spanish for an incorrect POS tag. I recognize this is not likely a bug in Stanza but just the result of training against particular data. Is it useful for us to create issues in such cases? I would think not, but this issue exists and wasn't closed.
(the POS issue was that Causa in "Causa gran incomodidad que se corte el agua todos los días." should be a verb, not a noun).
Yeah, the more annotations you can provide, the better. If you can provide a complete labeling for that sentence we can include it in future versions. As it stands, we don't have enough Spanish linguistic expertise to do anything with that sentence.
On Wed, Sep 9, 2020 at 12:41 PM Frank notifications@github.com wrote:
I have a similar issue in Spanish for an incorrect POS tag. I recognize this is not likely a bug in Stanza but just the result of training against particular data. Is it useful for us to create issues in such cases? I would think not, but this issue exists and wasn't closed.
(the POS issue was that Causa in "Causa gran incomodidad que se corte el agua todos los días." should be a verb, not a noun).
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/stanza/issues/408#issuecomment-689776248, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWKY47HMXUAIXPTK7FTSE7K6JANCNFSM4PU6O4QA .
I have a similar issue in Spanish for an incorrect POS tag. I recognize this is not likely a bug in Stanza but just the result of training against particular data. Is it useful for us to create issues in such cases? I would think not, but this issue exists and wasn't closed.
(the POS issue was that Causa in "Causa gran incomodidad que se corte el agua todos los días." should be a verb, not a noun).
What about "El exceso de velocidad causa accidentes."?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Latest English POS models:
Please put the trash in the trash can_NN
The trash can_MD get pretty rancid
Soda can_MD make me fart an unbelievable amount
I recycled the soda can_NN and the newspaper
The soup can_MD swelled up, which just means free botox, right?
Some soup can_MD corrode the can_MD it comes in
My art teacher used my charcoal pencil to make Jennifer's right can_MD a bit bigger when I asked for advice on the nude I had drawn
What is a liquid can_MD, anyway?
so it's somewhat better I guess
but we can_MD probably expand on the number of fake sentences we add to the training set and get better coverage
ps the ones that are wrong continue to be wrong if you use the electra-large
POS
With the default pipeline, "can" is tagged as a modal verb (MD) but it should be a noun (NN) in the following examples: