walidamamou / relation_extraction_transformer

35 stars 17 forks source link

Regarding Training data for Relation extraction #1

Closed Ibrokhimsadikov closed 3 years ago

Ibrokhimsadikov commented 3 years ago

Hi @walidamamou, thank you very much for your article about training custom Relation extraction component, it is really helpful but I was not able to see any relations in your train, test, dev binary files when I converted them into spacy nlp format. Can you please assist us to better understand it as several people raised this question on your article, Thank you

walidamamou commented 3 years ago

Thanks for reaching out. To clarify, we create a JSON file that contains all the relations and convert them to a binary file and not the other way around. You can directly use the binary files for training. I will make it clearer in the article.

Thank you for the feedback. Let me know if you have more questions!

Ibrokhimsadikov commented 3 years ago

Dear @walidamamou,

thank you for quick response. As far as I understood you, you are saying that ones relation extraction annotations converted to .spacy binary format, we will not be able to see relation extraction annotations like EXPERIENCE_IN and DIPLOMA_IN once we convert them back to spacy format through Docbin and Corpus objects, is it right?

In case it is true, would it be possible if you can add that original relation extraction file that u used to convert to .spacy

Thank you once again

walidamamou commented 3 years ago

Hi @Ibrokhimsadikov, I haven't tried to convert it back from binary file but I can check. Sure I will attach the original JSON file by tomorrow hopefully.

Ibrokhimsadikov commented 3 years ago

Yes, when I tried to check how they look as raw json training data, it looks like that

`` corpus = Corpus("./relations_test.spacy")

nlp = spacy.blank("en")

train_data = corpus(nlp) `` --ONE of train data docs:

{'doc_annotation': {'cats': {}, 'entities': ['O', 'O', 'O', 'O', 'O', 'B-EXPERIENCE', 'I-EXPERIENCE', 'L-EXPERIENCE', 'O', 'O', 'O', 'B-SKILLS', 'L-SKILLS', 'O', 'O', 'B-EXPERIENCE', 'I-EXPERIENCE', 'L-EXPERIENCE', 'O', 'B-SKILLS', 'L-SKILLS', 'O', 'B-SKILLS', 'L-SKILLS', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O'], 'links': {}}, 'token_annotation': {'ORTH': ['\n', 'BA', '/', 'BS', '\n', '5', '+', 'years', 'of', 'program', 'or', 'project', 'management', 'experience', '\n', '2', '+', 'years', 'of', 'technical', 'project', '/', 'program', 'management', 'experience', '\n', 'Track', 'record', 'of', 'operating', 'independently', '\n', 'Experience', 'understanding', 'user', 'needs', ',', 'gathering', 'requirements', ',', 'and', 'defining', 'scope', '\n', 'Communication', 'experience', 'interacting', 'with', 'a', 'variety', 'of', 'audiences', 'from', 'engineers', ',', 'to', 'vendors', ',', 'to', 'research', 'leaders', '\n', 'Track', 'record', 'of', 'building', 'cross', '-', 'functional', 'relationships', '\n\n', 'PREFERRED', '\n', 'Experience', 'working', 'with', 'UX', 'Research', 'and/or', 'UX', 'Design'], 'SPACY': [False, False, False, False, False, False, True, True, True, True, True, True, True, False, False, False, True, True, True, True, False, False, True, True, False, False, True, True, True, True, False, False, True, True, True, False, True, True, False, True, True, True, False, False, True, True, True, True, True, True, True, True, True, False, True, True, False, True, True, True, False, False, True, True, True, True, False, False, True, False, False, True, False, True, True, True, True, True, True, True, False], 'TAG': ['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', ''], 'LEMMA': ['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', ''], 'POS': ['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', ''], 'MORPH': ['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', ''], 'HEAD': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80], 'DEP': ['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', ''], 'SENT_START': [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}}

I cannot see relation extraction annotations :)

walidamamou commented 3 years ago

I have added the raw training, test and dev files for relations. Take a look and let me know if you have any more questions. Thanks for following up.

Ibrokhimsadikov commented 3 years ago

Thanks I will take a look

Ibrokhimsadikov commented 3 years ago

Seems this is what we want. Thank you for your support and time dear @walidamamou

karndeepsingh commented 2 years ago

Hello @Ibrokhimsadikov , I need a suggestion on How to prepare dataset for Relational Extraction task in spacy format before converting it to binay form of spacy. I am working on Relation Extraction usecase and I want to annotate the data for Relation Extaction any suggestion for annotation to fasten the process ,from your side would be great .

Thanks

Ibrokhimsadikov commented 2 years ago

Hi @karndeepsingh, I am having the same issue

karndeepsingh commented 2 years ago

Hi @karndeepsingh, I am having the same issue

Hi @Ibrokhimsadikov , Can you guide me how I can extract the information from .spacy binary file?

Thanks