stanfordnmbl / osim-rl

Reinforcement learning environments with musculoskeletal models
http://osim-rl.stanford.edu/
MIT License
893 stars 249 forks source link

Round 2 #164

Closed kidzik closed 5 years ago

kidzik commented 6 years ago

The reward in the second round:

Other updates:

AdamStelmaszczyk commented 6 years ago

@kidzik What are the rules regarding advancing to Round 2?

mattiasljungstrom commented 6 years ago

When are you planning to switch to branch v2.1? How long will the episodes be in round 2?

kidzik commented 6 years ago

@mattiasljungstrom it'll be around 10 seconds (I've just updated the initial post) @yobobobo we don't have the exact formula yet, we are tuning the scaling factor for activations The round itself will be short, but the reward function will be posted this week (almost two months before the end) @AdamStelmaszczyk we plan to keep it similar as the last year (last year we had ~50 teams)

kidzik commented 6 years ago

That's a rough formula for requested velocity vector (2d projection) in the Round 2: https://github.com/stanfordnmbl/osim-rl/blob/master/examples/Round%202%20reward.ipynb We will also subtract the sum of activations squared with a factor of 0.01. (the notebook is only for visualization of the behavior of the reward, of course, it will be implemented in the repo)

We've received the Google Cloud credits and we will send them out tomorrow. We will also release the ver2.1 branch.

kidzik commented 6 years ago

Fixed, thanks!

spiglerg commented 6 years ago

Will the velocity be clipped to prevent it from going negative?

kidzik commented 6 years ago

We plan to leave the possibility of negative velocity (which happens with very low probability).

kidzik commented 6 years ago

Version 2.1 is now merged with the master branch and deployed on the grader. The reward for the second round might still change. We extended the first round until October 20th.

zhengfeiwang commented 6 years ago

https://github.com/stanfordnmbl/osim-rl/blob/master/osim/env/osim.py#L527 In version 2.1, there is no argument Project for function reset(), therefore, we cannot receive dict type state_desc at the very beginning. Is that a bug needed to be fixed?

beforethefirst commented 6 years ago

I think that a target velocity should be given in advance so that it can know a goal and take an action pro-actively. So, the target velocity should be from ‘prev_state_desc’ instead of ‘state_desc’ when computing the 'penalty' Could you check the code from ‘reward_round2()’ and whether it is intentional or a bug?

penalty += (state_desc[“body_vel”][“pelvis”][0] - state_desc[“target_vel”][0])2 penalty += (state_desc[“body_vel”][“pelvis”][2] - state_desc[“target_vel”][2])2

phunghx commented 6 years ago

Why does the "target_vel" not include in observation state when using the summited server? Is the difficulty=0 with the round 2?

beforethefirst commented 6 years ago

@phunghx: If you set the difficulty to 0, a target velocity is a constant, so it may not need to specify. For Round 2, you should set the difficulty to 1, and then you will have the "target_vel" in the observation dictionary.

phunghx commented 6 years ago

@beforethefirst Sure, from my computer, I set the difficulty to 1 and get the "target_vel" in the observation state. But when I submit to the server for grading, I did not get the "target_vel". So I guess there are some problems with the server code or they also use diificulty=0. @beforethefirst Have you tried to submit your model to the server?

kidzik commented 6 years ago

@phunghx right now the reward on the server is the same as it was before (difficulty = 0 in the new repo). We will run difficulty = 1 in the second round. @beforethefirst good catch. Since it just affects ~3 steps out of 1000 and does not favor any tricks we may ignore this. However, if we get other suggestions on the reward for the second round, we may incorporate them all together.

phunghx commented 6 years ago

@kidzik Thank you very much

beforethefirst commented 6 years ago

@kidzik Currently, I am trying to train an agent to learn smooth transition, so it needs to know the target velocity when it takes an action. It might not be that critical, but could you update the code earlier? Or, could you make sure that it will be fixed when Round 2 begins?

JohnnyRisk commented 6 years ago

@kidzik WRT the google cloud credits, is there still time to get a positive score and be eligible for the credits? In order to qualify for them to we simply have to place before the end of round 1?

kidzik commented 6 years ago

@JohnnyRisk we will be able to award more credits (at least 50 x 250$). We will give details as soon as we know that everything went well in the first batch. @beforethefirst We will discuss that with the team. We don't want to change the reward too often to avoid miscommunication, so we are hesitant to implement small changes.

joneswong commented 6 years ago

now, the observation is 160 dimensional. as my placeholder is <None, 158>, if I still want to submit my current model, how to modify the observation returned by client? how about obs[:158]? Thanks

zhengfeiwang commented 6 years ago

@joneswong The two new dimension shall occurs at the index 156 and 159, I think you can remove them manually.

kirk86 commented 6 years ago

@wangzhengfei0730 thanks for that, it was giving me a headache, I had the same problem as @joneswong.

mattiasljungstrom commented 6 years ago

@kidzik The way the reward function is setup for round 2 the model doesn't need to turn in the movement direction, it only needs to move sideways.

Is this what you intended? I had expected that the model should turn in the movement direction.

kidzik commented 6 years ago

@mattiasljungstrom good point! We hypothesize that moving sideways will be suboptimal due to energy constraints. Hopefully, at some point solutions to the challenge will serve as a tool for hypothesis testing (or at least screening for hypotheses).

mattiasljungstrom commented 6 years ago

@kidzik Interesting, thanks for the reply!

kidzik commented 6 years ago

Some updates regarding qualifications to the second round:

kretovmk commented 6 years ago

Hi! I have a question about observation vector. Right now it is 160-dim independently of difficulty level. If I want to prepare to Round 2, I would like to have usual observation vector (may be 162-dim) that includes target_vel. So should I now manually at every step of env call get_state_desc() and concatenate 160-dim observation vector with 2-dim target_vel vector?

Or am I missing something obvious here? For me it seems that tasks in Round 2 and Round 1 are very different, so doesn't make much sense to concentrate on Round 1 task and instead invest more time in Round 2.

spiglerg commented 6 years ago

Do submissions to the live Round 2 leaderboard count towards the submissions limit? Or will only the full docker submissions count for that?

kidzik commented 6 years ago

@spiglerg they will not count towards the limit. Current round 2 is preliminary -- the final results will be based on docker submissions. @dd210 That's right round 1 and 2 differ a bit due to the target vector. Although, you can think about the round 1 as the round 2 with the target vector (3,0,0). It's not exactly that since the z dimension is ignored in round 1, but a good score in 2 should give you a decent score in round 1 with this approach. Regarding modification of the observation vector it is probably a good way to go

kidzik commented 6 years ago

Last year @spMohanty prepared a tutorial https://www.youtube.com/watch?v=2e495B8kmUk This year it will be similar (or even easier).

spiglerg commented 6 years ago

Awesome, thanks!

mattiasljungstrom commented 6 years ago

Are round 2 submissions still starting on 21st? It would be nice to get the details on docker setup a bit in advance. Thanks!

spMohanty commented 6 years ago

@mattiasljungstrom : Yes, round-2 will start on October 21st. I am travelling this week, but I release the new submission instructions as soon as I can.

huschen commented 6 years ago

Is the exact deadline for round-1 30.09.2018 11:59 PM UTC?

kidzik commented 6 years ago

@yobobobo The preliminary Round 2 will end on October 31st, then we will accept docker submissions for one week and the final evaluation will be done on the docker containers. @huschen Please submit to Round 2. Given that we have less than 50 people in the second round now, we may accept all submissions.

huschen commented 6 years ago

@kidzik Thank you very much. Just to be clear, all the people are qualified for the second round, and it is the score of the second round that will qualify people (in case of capacity limitation) for the final docker submission.

kidzik commented 6 years ago

@yobobobo Right, we extended the deadline a little bit since then. @huschen We will make the official statement about the requirements for the second round on the website shortly. Given that we have not too many people in the current Round 2, requirements won't be very constraining. So please go ahead and submit to the Round 2.

joneswong commented 6 years ago

all the homepages have not been updated and I am very confused.

before 21, oct, may I train models with round 2 env? If it is possible, no example in the homepage. just by env = Prosthetics(True, difficulty=1)?

for submission, no example also. There have been some submissions appearing in the leader board of round2. How to configure client.env_create(crowdai_token, env_id='ProstheticsEnv') to submit to round2?

Could anyone help me? Thanks!

kidzik commented 6 years ago

@joneswong Yes, you can train models for Round 2, the evaluation won't change. The final submission for the second round will be done through docker containers. We are working with Mohanty on details (actually for now its mainly his work). It will look like the last year (https://www.youtube.com/watch?v=2e495B8kmUk). The Round 2 which is up and running now is just a test one for checking the models etc. We chose docker as the final evaluation environment so that the random seeds are not exposed.

joneswong commented 6 years ago

@kidzik thanks for your reply!

the evaluation won't change

I found that there are many ~9000 in the round2 leaderboard. How to submit to round2? (i.e., how to configure the crwdAI client? I cannot find any example.)

joneswong commented 6 years ago

@joneswong The two new dimension shall occurs at the index 156 and 159, I think you can remove them manually.

@wangzhengfei0730 hi zhengfei, could you teach me how to submit to round2? I didn't find any example in the homepage. Besides, when I set difficulty = 1, the target_vel is constantly [1.25, 0.0, 0.0]. could you also tell me how to use round2 env? I am very confused with current situation...actually, I have been tuning my agent with round1 env up to now.

kidzik commented 6 years ago

@joneswong you need to submit to the right server as you can see here https://github.com/stanfordnmbl/osim-rl/commit/3ceadccc2f9104c9012281a482cfff5203f703bd

zhengfeiwang commented 6 years ago

@joneswong When set the difficuly as 1, the target_vel will change every 300 steps. I think your constantly target_vel caused by random seed. As for submission, agree with @kidzik that you should check your port for round 2 if you want to submit to the leaderboard of Round 2. Instead, if you mean submit for docker submission, we'd better look forward to official instruction.

joneswong commented 6 years ago

@kidzik @wangzhengfei0730 thanks for your help! I have completed training and submission by reading the code and following your explanation. Thanks again.

@kidzik Frankly speaking, I am very disappointed at the organization of this competition. The following announcement:

Round 2 (test run): 1.10.2018 - 28.10.2018 Round 2: 29.10.2018 - 4.11.2018

appeared just around three days ago. I keep checking https://www.crowdai.org/challenges/nips-2018-ai-for-prosthetics-challenge every 2-3 days and it always

Round 2: 21.10.2018 - 28.10.2018 (tentative)

Besides, the qualification condition was not declared officially (maybe just in this thread). I was afraid that there will be more than 50 teams above 1000+ and had kept tuning my agent for round 1 until it reaches high rankings. I just turned to round2 yesterday (T_T).

We sincerely suggest to postpone the ddl of round2. Could you help us to communicate this with crowdAI? It would be more friendly to competitors like us who have not participated in the nips2017 competition. As you can see, most of the experienced competitors are more familiar with this channel (i.e., via a github issue).

Anyway, thanks for your efforts! This simulator is awesome and brings much fun to us. We will keep our RL research upon osim.

kidzik commented 6 years ago

Thank you @joneswong for your comment. We moved the deadlines for 1 week (both round 1 and round 2) and that’s the only change. We added the current test version of the round 2 only for convenience sorry for the miscommunication. NIPS is already very soon and we need some time to arrange the logistics (visa invitation letters etc), so it might be difficult to push it further.

joneswong commented 6 years ago

@kidzik I see. For now, how to qualify for docker submission? topK in round2 leaderboard or 9700+? Besides, Is there any entry to register for docker submission? It seems that crowdAI has no means to communicate with me. Thanks.

kidzik commented 6 years ago

We qualify top 50 solutions from Round 1, we've updated the website to reflect that. Current Round 2 leaderboard is only a test and it is not taken into account for qualification. @spMohanty is preparing the docker submission infrastructure and we will provide instructions as soon as it's ready (it'll be similar to the last year's system: https://www.youtube.com/watch?v=2e495B8kmUk)

huschen commented 6 years ago

@kidzik, what is the exact end time for round 1? 29 Oct 10:00 UTC as stated on crowdAI?

kidzik commented 6 years ago

Yes, the time on crowdAI is official. We were shooting for the midnight October 28 Hawaii time.

huschen commented 6 years ago

Thanks a lot for confirming it:-)

joneswong commented 6 years ago

@kidzik hi, as the ddl is approaching, what's the condition of qualifying for docker submission?