Open seriouslysean opened 8 months ago
Feature Description
Something like this but for JS.
https://github.com/xtekky/gpt4free
It would be nice to abstract the GPT code to be more API agnostic, but for this ticket just adding a separate script to do the same thing we're doing today with a different, free/open API would be nice to see.
Hey how about using something like this https://sdk.vercel.ai/docs/guides/providers/hugging-face ? i would be happy to take on this issue if you want.
@mdrokz I'm certainly not against any sort of solution, by all means I'd love to see you set it up. The trickiest part is validating that the responses I get back match the events listed in the content. I've been curating a prompt over the last couple weeks to try and get GPT 4 just right, specifically with cases GPT 3.5 (turbo) couldn't handle.
Take these examples:
https://github.com/seriouslysean/monster-hunter-now-events/blob/main/src/utils/chat-gpt.js#L48
getEventsFromHTML
, will pass parse text from an html file and send it to the GPT API, https://github.com/seriouslysean/monster-hunter-now-events/blob/main/fixtures/20231001_news-oct-2023/index.htmlFor this particular example, it deviated from the norm and they added 3 events to the content rather than just one. So the output needs to make sure the JSON response takes that in to account.
Eventually I'd like to write some AB tests that validate known, older posts, against whatever API we end up using to verify the content stay correct.
It's been fun, but tricky!
@mdrokz I'm certainly not against any sort of solution, by all means I'd love to see you set it up. The trickiest part is validating that the responses I get back match the events listed in the content. I've been curating a prompt over the last couple weeks to try and get GPT 4 just right, specifically with cases GPT 3.5 (turbo) couldn't handle.
Take these examples:
https://github.com/seriouslysean/monster-hunter-now-events/blob/main/src/utils/chat-gpt.js#L48
- The function,
getEventsFromHTML
, will pass parse text from an html file and send it to the GPT API, https://github.com/seriouslysean/monster-hunter-now-events/blob/main/fixtures/20231001_news-oct-2023/index.html- The response should come back with some json, something like this, https://github.com/seriouslysean/monster-hunter-now-events/blob/main/src/utils/chat-gpt.js#L48
For this particular example, it deviated from the norm and they added 3 events to the content rather than just one. So the output needs to make sure the JSON response takes that in to account.
Eventually I'd like to write some AB tests that validate known, older posts, against whatever API we end up using to verify the content stay correct.
It's been fun, but tricky!
If we use openai API there is native json support now which the gpt model will validate against there is a article for it https://blog.simonfarshid.com/native-json-output-from-gpt-4 i implemented this in one of my API's do you want me to integrate the vercel AI sdk or the native json output for openAI API?
@mdrokz That article is really interesting! The GPT API still costs though, so I think for this ticket the free api still makes the most sense for now. If you'd like to pivot though, I can make a new ticket for the supported JSON generation.
@mdrokz That article is really interesting! The GPT API still costs though, so I think for this ticket the free api still makes the most sense for now. If you'd like to pivot though, I can make a new ticket for the supported JSON generation.
Oh cool i will start work on the vercel SDK first btw the vercel AI sdk also supports openAI so we can do both of the tasks with the vercel SDK. can you assign me this issue please ?
@mdrokz I think it might make more sense to do a totally new implementation just for the vercel models, so keep the openai integration also. Maybe fork the code in the fetchArticle
method to use one or the other based on a new ENV.
Something like this:
# Values could be `openai` or `vercel`
AI_INTEGRATION=openai
Then getEventsFromHTML
can be forked to use either openai or vercel which would mean moving it to some sort of ai-utils.js
file.
// import 2 different methods
import { getEventsFromHTML: getEventsViaOpenAI } from './chat-gpt.js';
import { getEventsFromHTML: getEventsViaVercelSDK } from './vercel-sdk.js';
function getEventsFromHTML() {
// If using openai, call getEventsViaOpenAI
// If using vercel, call getEventsViaVercelSDK
}
That's my suggestion anyway, open to other approaches for sure.
@mdrokz I think it might make more sense to do a totally new implementation just for the vercel models, so keep the openai integration also. Maybe fork the code in the
fetchArticle
method to use one or the other based on a new ENV.Something like this:
# Values could be `openai` or `vercel` AI_INTEGRATION=openai
Then
getEventsFromHTML
can be forked to use either openai or vercel which would mean moving it to some sort ofai-utils.js
file.// import 2 different methods import { getEventsFromHTML: getEventsViaOpenAI } from './chat-gpt.js'; import { getEventsFromHTML: getEventsViaVercelSDK } from './vercel-sdk.js'; function getEventsFromHTML() { // If using openai, call getEventsViaOpenAI // If using vercel, call getEventsViaVercelSDK }
That's my suggestion anyway, open to other approaches for sure.
Right makes sense i will get started on it. Thanks
Hey @seriouslysean so i explored the free models on huggingface but through testing it seems the prompt doesnt work on the models it doesnt give suitable results, as an alternative i tried using a NER (Named Entity Recognition) model to extract habitats & monsters & i extract date & time through regex this gives good results but if monsters arent found it picks up unrelated words.
I tried searching for other free services but couldnt seem to find any. The best result i got was with the NER model do you want me to create a PR ?
I thought that might be the case. I don't think any of those solutions will work due to the unknown nature of the content, the posts don't have any sort of rhyme or reason.
If you can get the NER option very close it might be worth it, or in another PR I just talked about adding a flag to switch between GPT models if that's something you'd be interested in.
I really appreciate all the digging, it's a complicated problem to solve, haha.
I'd love it if Capcom or Niantic just created an API of events for MHN, lol.
I thought that might be the case. I don't think any of those solutions will work due to the unknown nature of the content, the posts don't have any sort of rhyme or reason.
If you can get the NER option very close it might be worth it, or in another PR I just talked about adding a flag to switch between GPT models if that's something you'd be interested in.
I really appreciate all the digging, it's a complicated problem to solve, haha.
I'd love it if Capcom or Niantic just created an API of events for MHN, lol.
The best i got with the NER was able to capture monsters habitats & dates through regex if the event has monsters & habitats it will get them & the regex works really good for capturing date & times. but i saw there is dedupeJsonEvents
to merge events. I dont think it will be possible to do that without a LLM. For now i think GPT would work best. I would be happy to take on the task to switch between GPT models but i dont see any issue for it. Thanks
I tried almost every viable model but nothing comes close to GPT there is another alternative through LLama2 but for that we would have to self host a huggingface space (which is actually free if you want to go through that route) LLama2 model gave me good results. The huggingface API doesnt give access to LLama2 model inference,
Also yeah it would be awesome if Capcom or Niantic gives us an API haha
@mdrokz I got it set up for you, https://github.com/seriouslysean/monster-hunter-now-events/issues/41.
Feature Description
Something like this but for JS.
https://github.com/xtekky/gpt4free
It would be nice to abstract the GPT code to be more API agnostic, but for this ticket just adding a separate script to do the same thing we're doing today with a different, free/open API would be nice to see.