virtualmlnet / hackathon-2020

Virtual ML.NET Hackathon 2020
31 stars 17 forks source link

ML.NET Hackathon Idea - Text Analytics for Education/Learning #12

Open asacin opened 3 years ago

asacin commented 3 years ago

Hackathon Idea: Text Analytics for Education/Learning

Please fill out this form to submit an idea for the Virtual ML.NET Hackathon Digital analysis of a large piece of text (i.e. the bible, shakespeare works, the constitution) to provide insights into the text and convert into other forms of input to humans (video, Q & A chat robot, etc) in an effort to provide learning for all.

Your name: Antonio Sacin

Who's submitting this idea? If you already have a team, add all members of the team here Antonio Sacin

Team name: Charax

Already have a team? Tell us what we should call you! This is currently just myself, but would be happy to work with others interested into this. Team Name: Charax - I chose this name because this is mentioned once and only once in the entire Bible.

Brief Description

Description of what you want to achieve and what problem you're trying to solve I want to be able to analyze a large piece of text and find some contextual information regarding:

Other

Are you looking for team members?

Would you like to have a mentor assigned to your team?

fwaris commented 3 years ago

As far as I know, ML.Net has good text processing capability but the types of models you are looking for are beyond ML.Net (at least for now - however, I maybe wrong on this as new capabilities are continually being added).

'Deep' text understanding (e.g. Q & A) is the domain of models such as 'BERT' (and its derivatives) and the new GPT3 model which probably has 'read' all of the texts you have mentioned.

Note that pre-trained GPT3 is available as an API so we can build on top of that. Retraining of GPT3 on new data is best left to large enterprises as it would require a farm of GPUs.

ML.Net would be suited for simpler text tasks e.g. text classification and / or text clustering (topic modeling).

asacin commented 3 years ago

Working late today and coding in the hopes to finish before the dealine.

asacin commented 3 years ago

I completed the first version of the code by the deadline last night. The application can pull up any bible verse, verify that it is valid format and retrieve it from the data and call up an ML model to verify sentiment. Obviously this is a first step in analyzing text but this helped me to start shaping the idea more and learning what the boundaries are within the ML.net technology.

asacin commented 3 years ago

Here is a sample console screen with Deut 3,16 with the ASV version (American Standard Bible). Shows it is not a toxic statement! 👍

image