1) polish-summaries-corpus.py - fill source field with TODO label (done in PR#23 )
2) speakleash-categorization.py- change duplicated information in the output (done in PR#23)
3) speakleash_forums_questions.py - remove duplicated text in the source_name filed (done PR#23)
4) speakleash_forums_questions.py- remove ordinal numbers from input fiels. Examples:
"input": "3) czy starasz się o dziecko?",
"input": "4) czy planujesz jakieś zmiany?",
5) plwiki_random_word_pos.py - modify script with input field error (done PR#35)
6) polish-news-summarization.py - modify script to remove instructions with None in the input. (done in PR#28)
7) ipipan_polqa_questions.py - added deduplication and fixed multiple output. (done in PR#30)
8) poquad_text_extraction.py - added deduplication (done in PR#32)
9) human_annotators_common_errors.py - deduplicate examples by answers, format answers for further processing (done PR#33)
10) human_expert_gec_dataset.py - deduplicate examples by answers, format answers for further processing (done in PR#33)
-) Examine other scripts for fixes
1)
polish-summaries-corpus.py
- fill source field with TODO label (done in PR#23 ) 2)speakleash-categorization.py
- change duplicated information in the output (done in PR#23) 3)speakleash_forums_questions.py
- remove duplicated text in the source_name filed (done PR#23) 4)speakleash_forums_questions.py
- remove ordinal numbers from input fiels. Examples:5)
plwiki_random_word_pos.py
- modify script with input field error (done PR#35) 6)polish-news-summarization.py
- modify script to remove instructions withNone
in the input. (done in PR#28) 7)ipipan_polqa_questions.py
- added deduplication and fixed multiple output. (done in PR#30) 8)poquad_text_extraction.py
- added deduplication (done in PR#32) 9)human_annotators_common_errors.py
- deduplicate examples by answers, format answers for further processing (done PR#33) 10)human_expert_gec_dataset.py
- deduplicate examples by answers, format answers for further processing (done in PR#33) -) Examine other scripts for fixes