spoonsso / dannce

MIT License
210 stars 30 forks source link

Support for marmoset data. Is a corresponding pretrained_weight needed? #68

Closed Spartan859 closed 3 years ago

Spartan859 commented 3 years ago

I noted that when finetuning from the rat MAX pretrained_weight(3 cams) to fit marmoset dataset , val_loss is still high (value around 47 ) .

So, will I need a marmoset based pretrained_weight or train from scratch?

Since we currently lack hand-labeled data, we can't get at least 10k labeled frame to train dannce from scratch.

Is there any way to avoid the trouble?

spoonsso commented 3 years ago

For our marmoset analyses we finetune the rat MAX weights. You can try finetuning from our marmoset weights (link here), but in our experience with mice, further finetuning an already-finetuned network doesn't always work as well as going from the rat pre-train.

You say you lack hand-labeled data, but you must have some you are using for finetuning, right? How many frames are you using in total? Can you send me your config file so that I can take a look at the other settings you are using?

Are you confident that your COM traces are clean? Plotting the COM is a start. But to really test it, you can visualize whether it is clean enough to allow you to correctly capture the animal inside the 3D volume by creating a new directory (my_new_dir) somewhere, and then setting debug_volume_tifdir: path_to_my_new_dir inside the io.yaml file. Then run dannce-train. Rather than training, it will instead save all of your training input volumes into .tif files that you can visualize with ImageJ and make sure the animal is completely captured inside.

Another thing to keep an eye out for is errors in calibration and frame synchronization across cameras (we still run into this problem when setting up new systems). If you use Label3D, you can check for these issues by labeling a few body parts and making sure the triangulated+reprojected points (press 't' in Label3D after labeling the points) don't deviate from where you labeled them.

Spartan859 commented 3 years ago

Actually I'm a student doing research in Dr. Cirong LIU's lab, the Institute of Neuroscience, Shanghai. The config files have been attached in an email sent by him yesterday evening, at about 7:46.

Thanks for your advice! We will debug the COM traces ASAP.

The calibrations are not that precise, but generally good, with maximum error of about 1.5cm.

The frames are syncronized with maximum error of 20ms. Since the aqusition fps is 30, I consider this to be acceptable.

About the problem of 'new_n_channels_out', I'll further explain it in an email.

Spartan859 commented 3 years ago

Hello, we tried debug the COM traces volume, and found that the volumes debug seem strange. 1 of the 3 cams output normal pictures, while the other 2 cams give pictures as if they have been stretched. Just like the following ones. 0_15667_cam0

0_15667_cam1

0_15667_cam2

And as I change the vol, the extent to which they are stretched seems different. While the third camera always output a fine picture.

Can you help explain this? Or to say, it is somehow caused by wrong calibration?

By the way, when I use View3D to view the com labels, they all seemed fine. Does this mean that the COM's 3d coordinates are correct?

spoonsso commented 3 years ago

The volumes can look strange due to normal effects of lens distortion and the angle of view and the place in the image you are sampling from. But these do look a tad weird. If you scroll through the different slices in the stretched views, do you ever see something that looks like the animal? Also, for camera 3 -- are those corners the top of the arena and is the brown rectangle the arena floor? If so, how big is the arena? (edit: you could actually see the top & bottom of the arena in the volume if using a top-down view)

re: View3D. By com labels, do you mean the predicted COMs (the output from com-predict)? If so, then yes, they would be correct.

Spartan859 commented 3 years ago

For the com labels, I mean the predicted ones.

Sorry I didn't note that, I increase the 600mm box because in 600mm mode, I only see the tail of the marmoset. However,if the COMs are right, there's no possibility that 600mm box didn't cover the whole animal, so I still consider it strange.

0_29001_cam0

0_29001_cam1

0_29001_cam2

The first two camera still gives incorrect pictures.

Spartan859 commented 3 years ago

https://drive.google.com/file/d/11FM-ovqG5ksgY6Rkmuy5SdVIQSVsSEpt/view?usp=sharing

Spartan859 commented 3 years ago

The volumes are available from this link.

Spartan859 commented 3 years ago

image And this is what I see when I view my labels.

Is there any possibility that in label3d it reproject 3d points well, while in dannce-train it give a wrong reprojection? Do they have different standards?

Spartan859 commented 3 years ago

We checked the calibrations, and confirm that the extrinsics are correct, using unit "mm". However, we can't confirm the intrinsics. I tried using MATLAB's calibration app to produce intrinsics.It gives data which differs from the ones produced by multi-camera-calibration(developed by your team) a little bit.But I consider it generally ok.

I'll conclude my question here: Q1: Do Label3d show undistorted frames(using camera intrinsics), or display the original frames from videos?

Q2: How do dannce-train crop frames according to volumes and com3d? Do they reproject the 3d labels to 2d images using the camera params, and then crop a quadrilateral area of the image whose center is the label? If so, are there any other problems that may cause the frames to be wrongly croped?

Spartan859 commented 3 years ago

/home/xuchun/dannce/dannce/engine/generator.py:487: UserWarning: Note: ignoring dimension mismatch in 3D labels warnings.warn(msg)

Also, does this mean a critical error?

Spartan859 commented 3 years ago

Q3: I notice that when I reduce the vol_size, the results looks better. When I set vol_size to 100(of course the marmoset won't be covered fully), the output images are all centered around COM of the animal. The main body covers most of the picture. When I increase it to 300, the center of the picture began to shift. And when it is 600, the picture and file I sent has showed what it appears to be. How could this happen? If the COM label is wrong, why decreasing it didn't move the marmoset completely out of the view, but get it inside? Thanks for helping!

The debug vols of vol_size:100 are as follows.

0_16301_cam0 0_16301_cam1 0_16301_cam2

spoonsso commented 3 years ago

I just checked one set of your volumes with ImageJ's 3D viewer, and there is reasonable convergence of matching body features in 3D space, so I think they are probably fine.

I suggest trying a few things: 1) based on the config files you sent me, it looks like you are finetuning in "AVG" mode. For our marmoset analyses, we finetuned in "MAX" mode. Also, for our marmoset analyses, we finetuned from a MAX network pretrained on Rat 7M -- are you using the pretrained 3 cam MAX weights that I linked to? 2) As I explained in my e-mail, we have never tried training DANNCE using just a single landmark, and we believe using multiple landmarks is an important source of information during training. If you don't want to label the full pose right now, why don't you try combining your single head labels with our labeled marmoset data that I sent you?

spoonsso commented 3 years ago

Also, I do agree the volume is a bit big for your animal, especially if you are just trying to get the head (and not also the tip of the tail). Another thing to try would be reducing your vol_size to 400 mm.

Spartan859 commented 3 years ago

based on the config files you sent me, it looks like you are finetuning in "AVG" mode. For our marmoset analyses, we finetuned in "MAX" mode. Also, for our marmoset analyses, we finetuned from a MAX network pretrained on Rat 7M -- are you using the pretrained 3 cam MAX weights that I linked to?

A1: Yes, we use AVG mode, and use the pretrained 3 cam MAX weights. Actually we have tried MAX mode. The loss is really small, but on prediction it still gives an unacceptable result. So I suggest that maybe the main problem is single-landmark tracking.

As I explained in my e-mail, we have never tried training DANNCE using just a single landmark, and we believe using multiple landmarks is an important source of information during training. If you don't want to label the full pose right now, why don't you try combining your single head labels with our labeled marmoset data that I sent you?

A2: Thanks for advice and we'll test multi labels soon. I'll tell you whether it works fine.

Thank you for helping us these days!

spoonsso commented 3 years ago

No problem, please keep me posted. Happy to help.