ybgdgh / L3MVN

Leveraging Large Language Models for Visual Target Navigation
https://sites.google.com/view/l3mvn
62 stars 13 forks source link

Does your framework need to train? #1

Closed rginjapan closed 1 year ago

ybgdgh commented 1 year ago

If you use the zero-shot paradigm, it doesn't need to train. For the feed-forward method, the training of the fine-tuning network is necessary.

rginjapan commented 1 year ago

So you mean you did the fine-tuning for your pipeline?

ybgdgh commented 1 year ago

For the feed-forward method, yes. You can check more details from our paper.

rginjapan commented 1 year ago

I will, but will you plan to public the fine-tuning code in the repo?

ybgdgh commented 1 year ago

You can find the fine-tuning code on this page.

rginjapan commented 1 year ago

Thanks!Could you please guide more about how to use(train) fine-tuning pipeline?

rginjapan commented 1 year ago

BTW, where is "versioned_data"?

Object-Goal-Navigation/ data/ scene_datasets/ matterport_category_mappings.tsv object_norm_inv_perplexity.npy versioned_data objectgoal_hm3d/ train/ val/ val_mini/

rginjapan commented 1 year ago

After I have read your paper, I think the zero-shot you mentioned is just no need to train with LLM. You still specified the object category in evaluation, so it is not a task-oriented zeor-shot. Am I right?

ybgdgh commented 1 year ago

Thanks for reading our paper. Yes, as I said before, fine-tuning is only for the feed-forward method. You can find the versioned_data after downloading the HM3D dataset. Because in this task the target objects are fixed, we use more informative objects to find the relevance. If the target object is open-vocabulary, specifying the object category would not be suitable.

rginjapan commented 1 year ago

So what is the definition of zero-shot in your approach?

ybgdgh commented 1 year ago

In this paper, zero-shot means that we don't need to train the semantic policy to learn the semantic relevance from the training dataset. We can also select the special object categories based on object relevance in the language model if you think the category selection from the scene dataset is not suitable as zero-shot, but we just use this as a reference, and we have to choose such those object categories in the limited semantic labels.

rginjapan commented 1 year ago

Is it possible to test on other category in evaluation? Thanks for your kindheart response!

rginjapan commented 1 year ago

Sorry to bother again, after I downloaded the mini_val of hm3d, there is no train and val, do I need to download --uids hm3d_val and hm3d_train? Thanks!

ybgdgh commented 1 year ago

Sure! You can test on other categories. But you need to generate the hm3d evaluation dataset by yourself. If you have problems with the HM3D dataset, please check related repos such as habitat-lab and habitat-sim.