Open YupuLu opened 3 months ago
Hi Yupu,
Also if you're training a panda robot, use these parameters: --robot_name=panda --nb_nodes=12 --coeff_fn_internal_size=1024 --coeff_fn_config=3 --dim_latent_space=7 --batch_size=512 --learning_rate=0.00005 --gradient_clip_val=1 --dataset_tags non-self-colliding
Good luck! let me know if you have any other issues.
And just to be clear, there are pretrained models you can use: python scripts/evaluate.py --testset_size=500 --model_name=panda__full__lp191_5.25m
for example. All models found here: https://github.com/jstmn/ikflow/blob/master/ikflow/model_descriptions.yaml
Thank you for your reply! It really helps clarifying my confusion. I will try to install and test if everything works.
Still, I am wondering if the installation requirement can be loosed, such as the version of python (only 3.8) and pytorch (2.3). Will it work with pytorch 2.0 or python 3.9? If so, it will be easier to cooperate with many other projects for expansion.
Also I am somewhat new to robotic manipulator. If I want to utilize a learning model with jrl to other application (for instance, using pybullet) for the same robot like franka panda, is there anything that I should be aware of? Thanks in advance!
I managed to make it worked with python 3.9 and pytorch 2.0.1. Still not sure what will happen and will report anything if it is valuable.
Hi @YupuLu ,
Great, sounds like you got it working. Right now I only have python 3.8 allowed because it would be extra work to ensure it works on other python versions. I would guess the code should work fine for later python versions too. I think pytorch just needs to be > 2.0 because that's when setting the default type and device was introduced.
Did you do it by editing pyproject.toml
? If so can you post it in this thread so others can see.
"If I want to utilize a learning model with jrl to other application (for instance, using pybullet) for the same robot like franka panda, is there anything that I should be aware of?"
The thing you need to check is whether the urdfs are the same. To ensure they are the same, you can use the urdf used by IKFlow, which will be stored at ~/.cache/jrl/temp_urdfs/
. Otherwise, you'll need to verify the pybullet, and ikflow urdfs are identical.
Once you get ikflow working with pybullet, can you share the steps required in this thread? I'm curious to hear myself, and will be helpful for others
Hi Jeremy @jstmn,
Edit1: I checked the package version and the version of torch is still 2.4.0...
Edit2: Tested multiple times and found out a complicated way to install torch==2.0.1. I have no idea why '--no-update' did not work when I use poetry lock --no-update
and poetry kept updating torch to 2.4.0, so I just comment all the lines related to torch.
I am still quite unfamiliar with poetry so I am not sure what I did and why it worked. But here are my installation steps:
python = "^3.8.0"
and comment the line torch = "2.3"
)poetry lock --no-update
first) poetry install
to install jrl.python = "^3.8.0"
and comment lines FrEIA = "0.2"
, jrl = ...
and pytorch-lightning = "1.8.6"
).poetry lock --no-update
first) poetry install
to install ikflow.pip install FrEIA==0.2 pytorch-lightning==1.8.6 torch==2.0.1+cu117
Did you do it by editing pyproject.toml? If so can you post it in this thread so others can see.
I am developing my project and will test to see if everything works fine or not.
Thank you for your suggestions. I haven't tried such things before and it may take time for me to finish the verification. Wish me good luck :)
The thing you need to check is whether the urdfs are the same. To ensure they are the same, you can use the urdf used by IKFlow, which will be stored at ~/.cache/jrl/temp_urdfs/. Otherwise, you'll need to verify the pybullet, and ikflow urdfs are identical.
Hi Jeremy @jstmn ,
I notice that the data loading is not totally consistent. During training, some resources related to the robot model will be always loaded to "cuda:0". This problem can be reproduced when I call get_robot('panda')
with DEVICE='cuda:3'
in jrl.config.py.
| 0 N/A N/A 2527243 C python 510MiB |
| 0 N/A N/A 2527450 C python 510MiB |
| 3 N/A N/A 2527243 C python 3456MiB |
| 3 N/A N/A 2527450 C python 3456MiB |
Sounds like DEVICE
from jrl/config.py isn't being used everywhere. Which variables specifically have the wrong cuda device?
Well I did a simple test just now and here is the script I used with device='cuda:3'
:
from jrl.robots import get_robot
import time
if __name__ == "__main__":
time.sleep(1000)
As long as I used the first line, the problem happened. Even if I commented all the contents in jrl.robots.py except the function get_robot(), the memory (510MiB) related to cuda:0 will still be occupied. So I suppose that the fault is not related to the variables in the jrl project but has something related to the installation?
BTW, Would you mind providing the negative log likelihood curve for reference during training, just like the post you mentioned before?
What's the actual error your getting? Can you include the stack trace
Sure, here's the curve:
What's the actual error your getting? Can you include the stack trace
Actually there was no error. More easily, I entered Python through terminal with import jrl
and then monitored with nvidia-smi
in another section, there was a 510 MiB usage related to gpu0. But I am confusing why the importing action will lead to the gpu usage.
I notice the output like "Warp 0.10.1 initialized.....CUDA Toolkit: ...Devices:...Kernel cache:..." when mporting jrl. It seems to me that this step will take up the gpu usage so I suppose it has nothing to do with the package itself?
It could be from the forward-kinematics cache operation done here: https://github.com/jstmn/jrl/blob/master/jrl/robot.py#L236
The '"Warp 0.10.1 initialized.....CUDA Toolkit: ...Devices:...Kernel cache:..." ' happens whenever you call import warp
, so that's probably not it.
Hi Jeremy @jstmn,
About the Jrl library, if I want to add new robots into it, for example, ur3, what is the correct step to do that? Here is my solution based on my understanding:
robots.py
calculate_capsule_approximation.py
Are these steps enough? Should I use calculate_ignorable_link_collision_pairs.py
and calculate_rotational_repeatability.py
?
Yep! That looks like it.
Should I use
calculate_ignorable_link_collision_pairs.py
andcalculate_rotational_repeatability.py
?
Yes, run calculate_ignorable_link_collision_pairs.py
and save the output to the top of robots.py like is here:
RIZON4_ALWAYS_COLLIDING_LINKS = []
RIZON4_NEVER_COLLIDING_LINKS = [...]
# in __init__:
ignored_collision_pairs = RIZON4_NEVER_COLLIDING_LINKS + RIZON4_ALWAYS_COLLIDING_LINKS
Robot.__init__(
self,
Rizon4.name,
urdf_filepath,
active_joints,
base_link,
end_effector_link_name,
ignored_collision_pairs,
collision_capsules_by_link,
verbose=verbose,
additional_link_name=None,
)
No need to run calculate_rotational_repeatability.py
, just use ROTATIONAL_REPEATABILITY_DEG = 0.1
Also, can you do me a favor and open a new issue for this with this same question? Easier for others to find this info in the future.
Thanks
Sure and I raised the issue within the jrl project.
BTW, there are some modifications related to the jrl code, fixing bugs related to calculate_capsule_approximation.py
, add the Ur3 model, and complete the collision_capsules_by_link
for iiwa robots. Can I add a pull request for your reference?
Thanks. "Can I add a pull request for your reference?" yep! thanks
Hi Jeremy,
Thank you for providing the official implementation code! Some confusing points came up during applying the code and I am wondering if you can help me.
I will be greatly appreciated if I can hear from you soon.
Best regards, Yupu