Closed DeeMcCart closed 1 week ago
Choices:
Number of Epochs : Iteration cycle in learning mode Adaptor size: relates to multi-threaded parallelism: More complex tasks may benefit from larger adaptor sizes.
Tuning a dataset e.g. JSONL data input, structured as key-value pairs Model validation e.g. classification - summarisation - extractive AI - chat For lovely examples of existing datasets, See hugging face datasets A report Adapt-LLM Finance-chat exists
Needs data pipeline are we gonig to scrape from internet are we going to get structured data from csvs etc class to do all the data pre-preparation in one go each JSONL file gets a name and a version - train it on, e.g. mortgage calculator.
What is the difference between JSON and JSONL? In summary, the key difference is in how they handle multiple JSON objects. Regular JSON files are typically a single, self-contained structure, while JSON Lines use a line-by-line format, allowing for easier streaming and processing of individual objects.
Note - Good data is essentail - cleansing might be needed (e.g. missing values) Pre-processing is essential (to
In actual fact the mode, datasets integration, and training process took place over multiple time periods during the week coming up to 30th August.
End result was:
1 foundational model per training cycle (incremental training not possible)
Using foundational rather than RAG (although RAG was tested; it can be used to build knowledge 'from the ground up', while foundational model builds on existing knowledge)
Moved this issue to 'done' 02/09/24 as part ofKNban board cleanup
EPIC: #10
As a AI developer I want to determine the functionality/capabilities to achieve build AI scope for the project
Assumptions or Pre-Requisites:
Acceptance Criteria: (Must be completed before task is moved to 'Done')
Tasks
Before changing task status to 'Review' or 'Done' please provide comment (and screenprints if appropriate) as documentary evidence of task completion