Open Bachstelze opened 4 years ago
@Bachstelze it looks like the first step might be to annotate a corpus in Universal Dependencies. I'd be interested in working on that, please feel free to contact me if you are too.
Are there proven and known ways to generate treebanks from scratch for post-editing? Is it possible to start with pos tagging and then preparse UD?
Maybe, but it would take you longer and you would end up with a worse end result. It's easier to just annotate from scratch. If there is glossed or tagged text this can be used to bootstrap a conversion. You could for example use UD Annotatrix (with apologies for the orthography): You can skip some of the steps if you have a decent part-of-speech tagger, or a glossed corpus. I'm guessing that for Abkhaz morphological analysis would also be needed if you want to fill out the FEATS column. Anyway, I think that it would make a nice project.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
How can we add the abkhazian language? There are a few resources like https://gitlab.com/Bachstelze/alp and https://github.com/danielinux7/Multilingual-Parallel-Corpus . Can we port those models to stanza or do we have to retrain them?