tl-its-umich-edu / annoto-gai

This is Github Project to Annoto GAI work
0 stars 2 forks source link

Addressing Issue 38 #39

Closed takposha closed 4 months ago

takposha commented 4 months ago

This PR is for #38.

This is a bit of a larger PR due to integrating a new question generation system that uses only LangChain. As of now, it is functionally complete, but some improvements to the saved question file and checks can be made and will be fixed in later PRs to avoid overloading this one.

Changes have been made in how the config file is set up, as not all variables are needed when using the LangChain model. Some tweaks to the LangChainBot have also been applied to adjust itself based on what generation model it works with.

The question generator scripts have been split into two scripts to account for each generation type. The BERTopic approach is unchanged overall in how it functions, but some functions and classes have been renamed to ensure that there is a distinction between the BERTopic and LangChain approaches. The LangChain approach follows a similar class structure to BERTopic so that the same loading, saving, and file-writing function calls can be made.

questionGenerator.py now instead handles the abstraction of the question generation pipeline, so that the function calls needed by a user to extract questions are still a single function call, irrespective of the generation model being used.

pushyamig commented 4 months ago

I will review it in the afternoon.

pushyamig commented 4 months ago

I am seeing this error ModuleNotFoundError: No module named 'langchain_chroma' is there a new package added to the requirements.txt? that is not pulled as part of this PR

pushyamig commented 4 months ago

I've noticed that the LangChainQuestionData and BERTopicQuestionData classes share some functions and variables. In the future, it might be beneficial to use Python inheritance in this scenario. I think this would be a good opportunity to implement it.

For reference: Inheritance in Python

This is something to think about not suggesting to change.

pushyamig commented 4 months ago

I think this PR is good but need change as script fails with missing module.

Thank you for illustrating the difference between GAI and a mix of AI/GAI for question generation. You honored this requirement and addressed it promptly upon our request.

takposha commented 4 months ago

The latest commit should address the missing module requirement and allow for a case-insensitive input of the generation model in the .env file. It will get converted to the correct name after being processed.

In terms of inheritance, I had not thought of it at all, and this will be reflected in my next PR as well, where I introduce a new class called QuestionData that acts like a common class for the two approaches in which all the data is stored. I use this one to handle saving text files, printing logs, and so on. Making it with inheritance in mind would have been a better coding approach. We can discuss this in the next PR to see if it would be better to convert the classes to follow this kind of inheritance.

pushyamig commented 4 months ago

I see this error from latest commit

ERROR: Could not find a version that satisfies the requirement langchain_chromatic==0.1.1 (from versions: none)
ERROR: No matching distribution found for langchain_chromatic==0.1.1
takposha commented 4 months ago

I see this error from latest commit

ERROR: Could not find a version that satisfies the requirement langchain_chromatic==0.1.1 (from versions: none)
ERROR: No matching distribution found for langchain_chromatic==0.1.1

This has been corrected in the latest commit.

pushyamig commented 4 months ago

I see this error now

2024-06-26T10:57:44-0400 INFO [BERTopicQuestionGenerator.py:174] - Question: What is a feature that new quizzes offer that classic quizzes do not?...
Answers:
    1. Hotspot questions
    2. Multiple choice questions
    3. Fill in the blank questions
    4. Ungraded surveys
Correct Answer: 1. Hotspot questions
Reason: Hotspot questions are mentioned as a new type of question that can be created in new quizzes, which ...
2024-06-26T10:57:44-0400 INFO [BERTopicQuestionGenerator.py:166] - Topic: Online Quiz Moderation Features
2024-06-26T10:57:44-0400 INFO [BERTopicQuestionGenerator.py:167] - Insertion Point: 01:19:16
Traceback (most recent call last):
  File "/Users/pushyami/git_tl_projects/annoto-gai/captionsProcessor.py", line 15, in <module>
    main()
  File "/Users/pushyami/git_tl_projects/annoto-gai/captionsProcessor.py", line 11, in main
    processCaptions(config)
  File "/Users/pushyami/git_tl_projects/annoto-gai/questionGenerator.py", line 76, in processCaptions
    questionData.printQuestions()
  File "/Users/pushyami/git_tl_projects/annoto-gai/BERTopicQuestionGenerator.py", line 173, in printQuestions
    reason = f"Reason: {parsedResponse['reason'][:100]+'...'}"
TypeError: unhashable type: 'slice'

My .env file setting VIDEO_TO_USE = 'New Quizzes Video' GENERATION_MODEL = 'BERTopic' QUESTION_COUNT = 2 OVERWRITE_EXISTING_TRANSCRIPT = 1 OVERWRITE_EXISTING_TOPICMODEL = 1 OVERWRITE_EXISTING_QUESTIONS = 1

takposha commented 4 months ago

I see this error now

2024-06-26T10:57:44-0400 INFO [BERTopicQuestionGenerator.py:174] - Question: What is a feature that new quizzes offer that classic quizzes do not?...
Answers:
  1. Hotspot questions
  2. Multiple choice questions
  3. Fill in the blank questions
  4. Ungraded surveys
Correct Answer: 1. Hotspot questions
Reason: Hotspot questions are mentioned as a new type of question that can be created in new quizzes, which ...
2024-06-26T10:57:44-0400 INFO [BERTopicQuestionGenerator.py:166] - Topic: Online Quiz Moderation Features
2024-06-26T10:57:44-0400 INFO [BERTopicQuestionGenerator.py:167] - Insertion Point: 01:19:16
...
  File "/Users/pushyami/git_tl_projects/annoto-gai/BERTopicQuestionGenerator.py", line 173, in printQuestions
    reason = f"Reason: {parsedResponse['reason'][:100]+'...'}"
TypeError: unhashable type: 'slice'

That seems to have happened because of f-string changes in a newer version of Python that appear to be more flexible than in the older versions. I made a change that should fix it in the latest commit.

pushyamig commented 4 months ago

That seems to have happened because of f-string changes in a newer version of Python that appear to be more flexible than in the older versions. I made a change that should fix it in the latest commit.

I don't know how i got reverted back to Python 3.9 version. So now I am running 3.10. It might by my mistake. That my be the reason for that issue