Open dan-homebrew opened 1 week ago
@louis-jan @urmauur @namchuai @nguyenhoangthuan99 I would like to explore doing the Ichijo demo as a fork of Jan, that is run server-side:
For features, see the OG post - in this post, I focus on changes to Jan
threads > messages
, to collect RL type datasetEverything we do for this demo, should add towards Jan being able to support Voice Mode though - this should not be a separate repo.
cc @0xSage @tikikun @bachvudinh
Comment moved to https://github.com/homebrewltd/internal/issues/36
@louis-jan @urmauur @nguyenhoangthuan99 reference codebase https://github.com/tikikun/public_demo_llama3s
Will we remove any irrelevant parts that could cause side effects? E.g. CI Pipelines that push release artifacts to Jan's S3.
To me, it should only add more features if it's intended for a short-lived repository. Otherwise, it would make syncing them up or merging them back a nightmare.
It's a server-side demo
- going back to the previous Jan web server demo. How does the current architecture work?
Conversational extension is the only one involved for now. You can disable the others, as the demo works with remote endpoints.
Should it be an extension DB or filesystem?
To me, multi-user support works better with a relational DB. The filesystem would take more effort. E.g. An auto-generated endpoints DB system would be great. Audio files will be stored in a single location (is that a bad idea?)
How does the content rating system work?
It would be great to be part of the message object. It is a part of the Message Update endpoint, so no need to introduce a new one?
Ichigo |
---|
@louis-jan I realize I may have been mis-communicated by asking for Ichigo to be a fork of Jan.
I would like to clarify my position: Jan should support Ichigo as a model
I would like to use Ichigo to drive improvements at Jan and Cortex:
Jan
Cortex
/audio/completions
/audio/speech
Or:
Decision: Keep ichigo.homebrew.ltd demo separate
Structure
Reason
Assignees
Key Tasks
@nguyenhoangthuan99 , I'm excited to see the progress you're making on the backend for the demo! I have a few questions to help me understand your design choices:
@nguyenhoangthuan99 , I'm excited to see the progress you're making on the backend for the demo! I have a few questions to help me understand your design choices:
- When you say "Fish is faster," could you give me a rough estimate of the speedup we can expect? I'd love to understand the trade-offs with quality degradation.
- I noticed that Ichigo uses Whisper semantic tokens, but Fish is incompatible with a future version of Ichigo that outputs semantic tokens directly. Can you help me understand why this isn't a concern for you?
- I was a bit concerned by the demo this morning - the performance of Fish seemed a bit stilted. Are there any plans to improve this aspect, or is it not a priority?
I just finished Test both whisper speech and fish-speech and here is the result
I tested with this prompt
In the realm of advanced technology, the evolution of artificial intelligence stands as a
monumental achievement. This dynamic field, constantly pushing the boundaries of what
machines can do, has seen rapid growth and innovation. From deciphering complex data
patterns to driving cars autonomously, AI's applications are vast and diverse.
_ | Whisper speech | Fish speech |
---|---|---|
VRAM | 9 GB | 2 GB |
Time spend | 22 s | 4 s |
Goal
Questions
Tasklist
Scope
Features
Design