tl-its-umich-edu / annoto-gai

This is Github Project to Annoto GAI work
0 stars 2 forks source link

Addressing Issue #19 for `TranscriptData` Class #22

Closed takposha closed 4 months ago

takposha commented 4 months ago

Fixes #19

This PR reworks the TranscriptData class used to process and load the caption data as raised in #19. It aims to make the class have a similar structure to the ones used for Topic Modelling and Question Generation, including saving the data. Transcript data being saved doesn't save significant computation power now, but once sentence segmentation is implemented, it will be useful to not reload and reprocess the data from scratch every time.

Changes only significantly affect the configData.py and transcriptLoader.py scripts, with a new .env variable added for the option to overwrite and reprocess transcription data.

pushyamig commented 4 months ago

The PR looks Good and making the design pattern for transcript similar with Topic, question Generation.