jina-ai / finetuner

:dart: Task-oriented embedding tuning for BERT, CLIP, etc.
https://finetuner.jina.ai
Apache License 2.0
1.46k stars 66 forks source link

reorganize documentation #688

Open bwanglzu opened 1 year ago

bwanglzu commented 1 year ago

as pointed out by @guenthermi , a lot of improvements needs to be done in documentation page, such as motivation, data preparation, tabs,, and readme, we'll list them in this issue and improve them over time.

and this is linked to our requirement from design team @CatStark .

guenthermi commented 1 year ago

Here is a list of the things, which we wanted to change: Problem: For many Non-ML engineers it is not really clear how Finetuner helps in improving search and how it works what they need to do with it to improve there search sytem. Proposed Changes: We should include in the README and the first documentation page a small explanation on the basis of a picture which describes the mayor steps on how to use finetuner, i.e., (1) prepare (and label) your data, (2) submit a finetuning job to the cloud, (3) integrate the model into your neural search pipeline.

Problem: The documentation is perceived very technical and hard to get started. Proposed Changes: We should create a getting started page (maybe re-use on of the notebooks) which should cover the main points from the walkthrough on one page. The notebooks are already good, but missing some essential parts for a good getting started page. For example, the data preparation and the hosting is missing.

Problem: The data preparation section only explains the format. For someone new to metric learning and search, it is hard to understand what kind of data needs to be labeled, how to label data, and why different models need data in different formats. Proposed Changes: Add explanations for this to the data preparation section.


Tickets:

https://www.notion.so/Outline-for-Finetuner-doc-ae9a657e0b854359b767fe7c26cd9ee7

scott-martens commented 1 year ago

I would propose to divide this into three (maybe more tickets) because this is a big job.

bwanglzu commented 1 year ago

The general education section (What is fine tuning? How does it work? Why?) is something I could start on right away, with an engineering review; or we could do it the other way around, have engineers write it and Team Tech Content can review. I am open to either.

if you can write something and we review that would be nice. I think it would be nice that "someone out of the loop" write it since the engineering team already have a lot of understanding on the software, without awareness of the writer and reader are not on the same page.

scott-martens commented 1 year ago

I will link a ticket from the tech content board.

bwanglzu commented 1 year ago

A Getting Started/Quick Start page is also a good idea. I would propose one of more engineers start on that with Tech Content involvement and review.

How do you think about our current walkthrough part, or this quick start is a different section?

bwanglzu commented 1 year ago

I have a more general problem with some of our notebooks, that I want to discuss. A lot of times, you follow the instructions and they just don't work. Or there is such a long processing or setup time that they're impossible to follow.

In general our documentation in the notebooks are well tested. But indeed, given the nature of fine-tuning could be time consuming, the fit might take a while to finish. Notebooks out of the documentation from finetuner are not guaranteed to be runnable.

bwanglzu commented 1 year ago

fyi @CatStark

scott-martens commented 1 year ago

How do you think about our current walkthrough part, or this quick start is a different section?

I think the walkthrough is too fast, and I might offer a fully working example with some data to fine-tune with. I would warn at each stage of what might go wrong. Like: Did you log in to Jina Cloud? Or: This may take some time, depending on cloud load. Or even: Make sure your current python environment is the one where you actually installed exactly the things we told you to install.

It's not idiot proof. I know, because I'm an idiot. :)

I've been going through other documentation (the main Jina docs actually) and doing the things it shows on the first pages, like a new user. My failure rate has been very, very high.

scott-martens commented 1 year ago

I will link a ticket from the tech content board.

https://github.com/jina-ai/team-tech-content/issues/77

guenthermi commented 1 year ago

How do you think about our current walkthrough part, or this quick start is a different section?

The Walkthrough is not a "Getting Started" since it tries to cover all cases, the "Getting Started" should only cover one very specific example, but this is much more detail and idiot proof as Scott said.

guenthermi commented 1 year ago

The general education section (What is fine tuning? How does it work? Why?) is something I could start on right away, with an engineering review;

Sounds good. As I wrote, it would be nice to have a very simple flow chart for this which displays something like those 3 steps I mentioned which then are explained along with the example in the getting started section. We could later send this to the design team to make it more beautiful.

bwanglzu commented 1 year ago

more, we need to add a JAC page to JAC documentation

bwanglzu commented 1 year ago

need to add documentation:

  1. LLRD
  2. CosineSimilarityLoss
  3. new way of construct DA from CSV for CosineSImilarityLoss

LLRD:

The LLRD assigns different learning rates for each layer of the model backbone. It sets a large learning rate for the top layer and uses a multiplicative decay rate to decrease the learning rate layer-by-layer from top to bottom. With a large learning rate, the feature of the top layers changes more and could adapt to new tasks. On the contrary, the bottom layers have a small learning rate, so the strong feature learned from the pre-training is preserved.

scott-martens commented 1 year ago

Outline of docs from meeting w/ @bwanglzu @guenthermi @LMMilliken : https://www.notion.so/Outline-for-Finetuner-doc-ae9a657e0b854359b767fe7c26cd9ee7?pvs=4

guenthermi commented 1 year ago

First draft of the getting started: https://colab.research.google.com/drive/1DSvA9x4xi6GL7ulcUjgIYlgvUdQJqrJG?usp=sharing

CatStark commented 1 year ago

We are looking for a new datase

guenthermi commented 1 year ago

Results from Getting Started Guide: pretrained: download (2) finetuned: download (3) pretrained: download (1) finetuned: results

guenthermi commented 1 year ago

getting started image: download (4)

guenthermi commented 1 year ago

JAC images: download (6) download (5) download (7)

guenthermi commented 1 year ago

download (8)