nd-crane / trusted_ke

Trusted Knowledge Extraction for Maintenance and Manufacturing Intelligence
Apache License 2.0
2 stars 1 forks source link

Get MaintNet Working with Coreference Resolution #10

Closed JonathanKarr33 closed 2 months ago

JonathanKarr33 commented 1 year ago

Get MaintNet working with existing coreference resolution

JonathanKarr33 commented 1 year ago

Realized for Spacey that en_core_web_trf has much better accuracy than en_core_web_sm. 96% vs 87%, but takes up more space to use and longer to run. It''s recommended to use trf for research since it provides the highest accuracy. We can also compare later in testing.

Setting up virtual environment. Creating this document so that others can easily set it up once we set it up once. https://docs.google.com/document/d/170yBWllYl5RXkYSy9be31eN775qp2LEdJkwAqz3ZXFk/edit?usp=sharing

I will also set up the virtual environment on the server instead of my local computer and see if that solves any issues.

JonathanKarr33 commented 1 year ago

Redoing spacy coref.py because it is an incompatible version between crosslingual-coreference and spacy. Found that spacy can handle panda data frames directly: https://stackoverflow.com/questions/43451906/load-column-in-csv-file-into-spacy Also found spacy coreference good for smaller datasets: https://spacy.io/universe/project/neuralcoref Implementing that now

JonathanKarr33 commented 1 year ago

Got spacy working with csv but MaintNet data deosn't many it's or stop words. Coref resolution not resolving its properly. Will look into this but may be beneficial to pause coref and go into names entity linking