Question about the code

leoozy commented 2 months ago

                        # Only evaluate terminating trajectories
                        try:
                            if args.value_function:
                                score = value_function.evaluate_success(
                                    screenshots=last_screenshots[-(args.max_depth+1):] + [obs_img], actions=temp_action_history,
                                    current_url=env.page.url, last_reasoning=a["raw_prediction"],
                                    intent=intent, models=[args.value_function],
                                    intent_images=images if len(images) > 0 else None)
                            else:
                                raise NotImplementedError(f"Value function {args.value_function} not implemented")
                        except Exception as e:
                            print(f"Error in evaluator: {e}")
                            score = 0

                        next_actions = []
                        if score < 1 and should_generate_next_actions:
                            # start_time = time.time()
                            temp_early_stop_flag, _ = early_stop(
                                temp_trajectory, max_steps, early_stop_thresholds
                            )
                            if not temp_early_stop_flag:
                                try:
                                    # Generate possible action candidates for next step.

                                    next_actions = agent.next_action(
                                        temp_trajectory,
                                        intent,
                                        images=images,
                                        meta_data=meta_data,
                                        branching_factor=branching_factor
                                    )
                                except ValueError as e:
                                    # get the error message
                                    print('Failed to generate next actions:', e)

Thank you for your excellent job. I am confused abou the next_actions here. The trajs input for the next_actions are exactly the same as the evaluated actions above. That is, the next actions are exactly the same as the actions above.

leoozy commented 2 months ago

Should the input traj be the obs of the exec after the action above?

kohjingyu commented 2 months ago

Thanks, this is good point!

Our implementation assigns the value of the next actions as the value of the current state, we do this because immediately evaluating the the "obs of the exec after the action above" (as you suggested) would be very expensive due to more backtracking calls. This is explained in more detail in Sec 3.3 and footnote 2 in page 5 of the paper. If we don't care about the cost of backtracking, then I agree that computing the obs AFTER executing the action is a better approach.

Hope that helps!

leoozy commented 2 months ago

Thx!

kohjingyu / search-agents

Question about the code #4