noco-ai / spellbook-docker

AI stack for interacting with LLMs, Stable Diffusion, Whisper, xTTS and many other AI models
https://github.com/noco-ai/spellbook-docker/wiki
Open Software License 3.0
139 stars 9 forks source link

Nice. #6

Open arthurwolf opened 8 months ago

arthurwolf commented 8 months ago

Really cool project.

I'm working on something similar (structurally at least), a manga-to-anime pipeline. It involves a lot of different steps/models, similar to this project:

I'll be looking closer into your project, in particular how it's organized, thanks a lot for sharing. I'd be curious if you have any insights on how you'd do manga reading if you had to.

Cheers!

masked

panel

prompt.json

prompt.txt

reading.json

response.txt

result.json

6253

6254

arthurwolf commented 8 months ago

page-3461-ids page-3462-ids page-3463-ids page-3464-ids page-3465-ids

arthurwolf commented 8 months ago

https://github.com/noco-ai/spellbook-docker/assets/108821/194a7878-4f30-4744-a9a8-f69f4a4f9591

result.json reading.json response.txt prompt.txt prompt.json panel masked

arthurwolf commented 8 months ago

result.json reading.json response.txt prompt.txt prompt.json panel masked

noco-ai commented 8 months ago

Hello! Your approach looks good to me, and it sounds like your hard work is paying off. If I was working on this particular project, I would experiment with fine tuning llava once you have a solid dataset to see if it gives better results than OpenAI's models. I have yet to see anyone share a finetune of llava for a specific task, so am curious how well it would work. If you are posting your progress on your project anywhere, please share the link as I am interest to see it in action once you have it all working.

arthurwolf commented 8 months ago

Thanks for the feedback.

I'll soon have about two comic books worth of data which I think would be enough to start fine tuning llava, but I have two issues there: 1. this is all very new, and there are no "easy guide" to fine tuning most things, llava even less, it's all very cryptic and assuming a high technical level, and 2. my assumption for llava is that even fine tuning would require a lot more compute than I can afford.

I've tried an alternative to this: trying to get my data into the llava training dataset for the next llava version. I've opened github issues and sent some emails but so far no answer. I hope I can make it happen, I think it'd benefit not just me, but the model itself also.

About posting progress ,I'm considering starting a youtube channel with some updates, I'll post about it here if/when that happens.

Cheers, and thanks again.