implement assess_sentence_syntax.py II

kramars-realspeak / fm-gai-lottie-multiple-choice-v1

fm-gai-lottie-multiple-choice-v1 is a software tool designed to automate the creation of multiple-choice activities enabling quick access to diverse and engaging exercises and novel learning content based on a small sample of target vocabulary.

0 stars 0 forks source link

implement assess_sentence_syntax.py II #9

Open Peter96K opened 1 month ago

Peter96K commented 1 month ago

Extend the text analysis capabilities to include syntax analysis. Consider different approaches and focus on the one adding the most value to the project.

Peter96K commented 1 month ago

This is a joint task that will include collaboration with @Peter96K .

Peter96K commented 1 month ago

Implement evaluation logic that will terminate the build process of the object if the object's sentence isn't cefr_level compliant.

CPTNFreedom commented 1 month ago

https://www.examenglish.com/CEFR/cefr_grammar.htm This is a link to a cefr grammar chart. we could possibly use this to create our own grammar suitability analysis if we can not find an appropriate program.

Peter96K commented 1 month ago

Note : handling cases where more than 1 correct answer applies e.g. 'a ball and a hat / a hat and a ball' either on the level of the TextAnalyzer() or kedtechLa playground

CPTNFreedom commented 1 month ago

Note : handling cases where more than 1 correct answer applies e.g. 'a ball and a hat / a hat and a ball' either on the level of the TextAnalyzer() or kedtechLa playground

Unless we can make it so there 2 correct answers in those sentences, we could maybe do it where and is never by itself, which would then force only one correct sentence.

CPTNFreedom commented 1 month ago

I had a look at CEFRpy's modules, and in my (very untrained) reading of it, it doesn't seem to deal with sentence level stuff. It does have some parts-of-speech identification which could be useful.

I was reading about NLTK and they have a sentence parsing module which might be useful, but I'm not sure. I will atatch it for you to look at. https://www.nltk.org/api/nltk.parse.html#module-nltk.parse

Peter96K commented 1 month ago

reviewed nltk main page with @CPTNFreedom , team will make a decision on the upcoming 1st of October post-mortem on the path forward as well as courses to purchase.

Peter96K commented 3 weeks ago

extended TextAnalyzer class using spacy as proof of concept, instantiated eng model, tokenized sentence and retrieved pos attribute from token object

Peter96K commented 3 weeks ago

next step : conclude this task within the current sprint by identifying the most appropriate spaCy methods and concepts. Once finalized transfer this task into the mc project for further development.

CPTNFreedom commented 3 weeks ago

Screenshot 2024-10-08 at 15 26 27 This is using token.text, token.pos, and token.dep all at the same time. it presents the info relatively clearly and I think there's definitely something we can do with all of these. Dependency features may or may not be useful now, but I think it could be important for some grammatical parameters later.

CPTNFreedom commented 3 weeks ago

Screenshot 2024-10-08 at 15 30 32 Added the use of token.morph. Morph is much more complicated and will take some time to before being able to read it fluently. But it gives us much more linguistic information about each word's role in the sentence. Probably of some use for grammatical/syntactic rules

CPTNFreedom commented 3 weeks ago

Screenshot 2024-10-09 at 13 22 12 This might be a better example of what .morph can do for us. Clearly shows tense and aspect of the sentence. Interestingly, isn't able to identify future tense as it is looking at each word individually (and future tense doesn't exist on a lexical level in English). But it able to identify aux verbs and participles, so I think we should be able to identify gramm. structures quite easily with it.

Peter96K commented 3 weeks ago

Sounds like you're on the right track, these findings will be useful for the completion of this GitHub Issue.

Peter96K commented 1 week ago

9 - simple script -> input sentence, output identifying a specific Text.morph item;

CPTNFreedom commented 1 week ago

So I was able to do some coding with chatgpt and had some interesting results. Screenshot 2024-10-22 at 19 15 31 This first one is an example of spacy detecting present perfect. I had to simplify this a little as chatgpt was keen to use child tokens and grandchild tokens (dependency tokens), which whilst it sounded great was not able to identify PP well. this simpler version seems a bit more reliable.

the same can be said for when I tried to create future perfect continuous. some of the code was really long (and pretty confusing) but also would not effectively detect FPC. Screenshot 2024-10-22 at 19 17 07

In the end, I was able to simplify the code a lot more and ended with this: Screenshot 2024-10-22 at 19 23 58 Which is I guess a little less perfect than it could be, but could be executed quicker and actually detects the future-perfect=-continuous.

Peter96K commented 1 week ago

Great job it's a good proof of concept. Next steps will be covering all cases when it is in fact the desired tense and we can extend TextAnalyzer to manually check all grammar. If you have enough time left this week I would suggest working on that , simply adding another if block e.g. if (doc[i].lemma_ == 'will not').. etc. what you can do is wrap the if block in a function called def check_affirmative(doc) which will accept 'doc' as its argument and you can hide the process of checking affirmative, negative, etc. but that's just a suggestion, not necessarily the only / best way to implement this. Let's further discuss on Monday, if you add any updates, I'll take a look.

CPTNFreedom commented 1 week ago

I tried to do a really simple one today but ran into some interesting issues. I tried to get a function to identify the past simple tense, which sound like the most straight forward thing with spaCy. However, it quickly became evident that we needed a way to differentiate past simple from perfect aspect. Chatgpt and I tried some different ways, from finding auxiliary verbs in the sentence, to trying to find "have" or "had" specifically. I'm a little surprised none of them worked, but they all returned 'past simple tense found'. Here are some of the examples below:

Screenshot 2024-10-23 at 15 15 32 Screenshot 2024-10-23 at 15 22 25

Maybe one of these just needs a simple tweak to get the right solution. I will try to work on identifying past/present perfect next week. Maybe if I can get that function the solution to past simple will be obvious.

CPTNFreedom commented 2 days ago

Today I was able to make 3 functions for A1 level. first I made a function to detect both comparatives ad superlatives. these are often taught next to each other in the syllabi and the code for them is very similar, so I've combined them into one function, but can easily be separated if need be. here it is:

similarly, I've combined possessive adjectives and the possessive s into one function here:

finally, a function to detect sentences like "I like playing football" with the verb + verb(ing):