Open bwanglzu opened 1 year ago
Here is a list of the things, which we wanted to change: Problem: For many Non-ML engineers it is not really clear how Finetuner helps in improving search and how it works what they need to do with it to improve there search sytem. Proposed Changes: We should include in the README and the first documentation page a small explanation on the basis of a picture which describes the mayor steps on how to use finetuner, i.e., (1) prepare (and label) your data, (2) submit a finetuning job to the cloud, (3) integrate the model into your neural search pipeline.
Problem: The documentation is perceived very technical and hard to get started. Proposed Changes: We should create a getting started page (maybe re-use on of the notebooks) which should cover the main points from the walkthrough on one page. The notebooks are already good, but missing some essential parts for a good getting started page. For example, the data preparation and the hosting is missing.
Problem: The data preparation section only explains the format. For someone new to metric learning and search, it is hard to understand what kind of data needs to be labeled, how to label data, and why different models need data in different formats. Proposed Changes: Add explanations for this to the data preparation section.
Tickets:
https://www.notion.so/Outline-for-Finetuner-doc-ae9a657e0b854359b767fe7c26cd9ee7
I would propose to divide this into three (maybe more tickets) because this is a big job.
The general education section (What is fine tuning? How does it work? Why?) is something I could start on right away, with an engineering review; or we could do it the other way around, have engineers write it and Team Tech Content can review. I am open to either.
if you can write something and we review that would be nice. I think it would be nice that "someone out of the loop" write it since the engineering team already have a lot of understanding on the software, without awareness of the writer and reader are not on the same page.
I will link a ticket from the tech content board.
A Getting Started/Quick Start page is also a good idea. I would propose one of more engineers start on that with Tech Content involvement and review.
How do you think about our current walkthrough
part, or this quick start is a different section?
I have a more general problem with some of our notebooks, that I want to discuss. A lot of times, you follow the instructions and they just don't work. Or there is such a long processing or setup time that they're impossible to follow.
In general our documentation in the notebooks are well tested. But indeed, given the nature of fine-tuning could be time consuming, the fit
might take a while to finish. Notebooks out of the documentation from finetuner are not guaranteed to be runnable.
fyi @CatStark
How do you think about our current
walkthrough
part, or this quick start is a different section?
I think the walkthrough is too fast, and I might offer a fully working example with some data to fine-tune with. I would warn at each stage of what might go wrong. Like: Did you log in to Jina Cloud? Or: This may take some time, depending on cloud load. Or even: Make sure your current python environment is the one where you actually installed exactly the things we told you to install.
It's not idiot proof. I know, because I'm an idiot. :)
I've been going through other documentation (the main Jina docs actually) and doing the things it shows on the first pages, like a new user. My failure rate has been very, very high.
I will link a ticket from the tech content board.
How do you think about our current walkthrough part, or this quick start is a different section?
The Walkthrough is not a "Getting Started" since it tries to cover all cases, the "Getting Started" should only cover one very specific example, but this is much more detail and idiot proof as Scott said.
The general education section (What is fine tuning? How does it work? Why?) is something I could start on right away, with an engineering review;
Sounds good. As I wrote, it would be nice to have a very simple flow chart for this which displays something like those 3 steps I mentioned which then are explained along with the example in the getting started section. We could later send this to the design team to make it more beautiful.
more, we need to add a JAC page to JAC documentation
need to add documentation:
CosineSimilarityLoss
CosineSImilarityLoss
LLRD:
The LLRD assigns different learning rates for each layer of the model backbone. It sets a large learning rate for the top layer and uses a multiplicative decay rate to decrease the learning rate layer-by-layer from top to bottom. With a large learning rate, the feature of the top layers changes more and could adapt to new tasks. On the contrary, the bottom layers have a small learning rate, so the strong feature learned from the pre-training is preserved.
Outline of docs from meeting w/ @bwanglzu @guenthermi @LMMilliken : https://www.notion.so/Outline-for-Finetuner-doc-ae9a657e0b854359b767fe7c26cd9ee7?pvs=4
First draft of the getting started: https://colab.research.google.com/drive/1DSvA9x4xi6GL7ulcUjgIYlgvUdQJqrJG?usp=sharing
We are looking for a new datase
Results from Getting Started Guide: pretrained: finetuned: pretrained: finetuned:
getting started image:
JAC images:
as pointed out by @guenthermi , a lot of improvements needs to be done in documentation page, such as motivation, data preparation, tabs,, and readme, we'll list them in this issue and improve them over time.
and this is linked to our requirement from design team @CatStark .