Open mixuala opened 6 years ago
What is the URL for the key that does not exist? Perhaps your human-feedback-api webapp
doesn't know what the correct bucket to look at is
I'm not an expert in django
yet, so I've just been hacking away. But it seems that the problem is in a sort order between the process which records video segments order_by('+created_at')
and the way human-feedback-api webapp
displays segments by order_by('-created_at')
I added the following hack and it seems to fix the problem. But I think
# ./rl-teacher/human-feedback-api/human_feedback_api/views.py
def _all_comparisons(experiment_name, comparison_id=None, use_locking=True):
not_responded = Q(responded_at__isnull=True)
cutoff_time = timezone.now() - timedelta(minutes=5)
not_in_progress = Q(shown_to_tasker_at__isnull=True) | Q(shown_to_tasker_at__lte=cutoff_time)
finished_uploading_media = Q(created_at__lte=timezone.now() - timedelta(seconds=25)) # Give time for upload
ready = not_responded & not_in_progress & finished_uploading_media
## order by created_at ASC, same as id
ascending=True
if ascending:
## Sort by priority, then put OLDEST labels first
ready = not_responded & finished_uploading_media
return Comparison.objects.filter(ready, experiment_name=experiment_name).order_by('-priority', 'id')
else:
return Comparison.objects.filter(ready, experiment_name=experiment_name).order_by('-priority', '-created_at')
But I'm not exactly clear how RL with human feedback is supposed to work. I'm running the experiments on an old MacBook Pro, so the availability of recorded video is always behind the latest comparison as shown by whats uploading in the logfile. I give feedback on 3-5 comparisons, then come back 10-20 mins later for the next batch.
But it seems to me that the most recent comparison/video segments have the benefit of more Q-learning––and rating these comparisons would have a greater learning benefit. If I only provide feedback on a few comparisons every 20 mins, would I get better results by giving feedback for the most recent ones? Does the learning algorithm still work if I offer sparse feedback, or do I need to provide feedback for every comparison?
if yes
, then I suppose it would be better to record and provide feedback on video segments based on the most recent experiments first. right?
I got to this point following the
RL-teacher
Usage docsI was able to use the
human-feedback-api webapp
to provide feedback for the 175 pre-training labels. After that, theagent
began to learn based on the pre-training feedbackBut joint training failed. The
human-feedback-api webapp
displayed only blank screens. When I checked the URL for the videos in a separate tab, I got an XML error message that saidThe specified key does not exist
At the same time, the
teacher.py
script continued to generate video samples and upload to GoogleCloudI can manually confirm that the media files exist in Google Cloud
I waited many minutes, refreshed the webapp, even clicked
can't tell
a few times, but the video never reappeared after the (successful) pre-training.