Closed jlshin closed 2 years ago
Thanks for responding to my email, definitely understand it taking a few days to get to this and appreciate the help. I thought it might also be worth mentioning that I also cannot reproduce the no fine-tuning results from the paper (Table 3). I am getting much worse results (near 0 AP) for each episode.
I have been comparing my performance to the file posted in this comment: https://github.com/jshtok/RepMet/issues/22#issuecomment-579112508 for the 5-shot 5-way task and I see that I am using the correct data (# GT examples per episode are the same), but the Recall and AP is very far off.
Any chance the fpn_pascal_imagenet-0015.params file accidentally got changed in the google drive?
I resolved the issues I was observing. I am going to leave this issue open because I still believe that in order to fine-tune you need to change the balance_classes
parameter in the config (yaml) file to false
, otherwise the issue with the index error above is raised.
I have found other users' comments helpful so I will return the favor:
lib/nms/gpu_nms.so
file to lib/nms/gpu_nms_9.so
Hi Joanne,
Thank you for letting me know. I will check how the balance_classes() causes the error, maybe it needs fixing; anyway, it is seldom relevant. I am glad you solved the issue.
Best Regards, Joseph
On Sat, Dec 5, 2020 at 2:33 AM Joanne Shin notifications@github.com wrote:
I resolved the issues I was observing. I am going to leave this issue open because I still believe that in order to fine-tune you need to change the balance_classes parameter in the config (yaml) file to false, otherwise the issue with the index error above is raised.
I have found other users' comments helpful so I will return the favor:
- The main issue was an incompatibility between my GPU and cuda 8. I upgraded mxnet 1.0.0 to use cuda 9 and am able to fine-tune and train
- Additionally, this involves changing the lib/nms/gpu_nms.so file to lib/nms/gpu_nms_9.so
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jshtok/RepMet/issues/33#issuecomment-739090891, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACOBU6WYP4PSHISXSGS3BWDSTF5WJANCNFSM4UJVFU7A .
I read through a few of the closed and open issues and I am observing an issue similar to #9.
Setup I am trying to work through the examples listed in the README with the ImageNet data (I followed the link to download), set up the paths accordingly and have not changed anything aside from renaming the filepaths in the pickle files downloaded from the google drive (
voc_inloc_roidb.pkl
andvoc_inloc_gt_roidb_.pkl
). Aside from renaming paths, I am using the version offew_shot_benchmark.py
and the config file that is currently on the master branch.Questions
/experiments/cfgs/resnet_v1_101_voc0712_trainval_fpn_dcn_oneshot_end2end_ohem_8.yaml
?balance_classes
supposed to be set tofalse
when fine-tuning with episodic data (it is set totrue
in the config)?Issue I am encountering the out of index error that was observed in #9 and am confused by the discussion on that thread. Here is what I have tried to run
Namely, I am not sure I understand this comment that @jshtok made (or if it is relevant to the solution):
The
NUM_CLASSES
is changed from 122 (from the YAML file) to 127 whenadd_reps_to_model
is called andnew_cats_to_beginning
is hard-coded to beFalse
so unless these parameters are intended to be set to different values, it does not surprise me thatNUM_CLASSES = 127 (122 + Nway)
here.https://github.com/jshtok/RepMet/blob/9bdc3f20ff08a8b3ce005af327aba6bf0bb71213/fpn/few_shot_benchmark.py#L661-L663
Here is what I have noticed when trying to debug this issue:
balance_classes
is set toTrue
in the configuration yaml file. We enter into thebalance_classes
method in thePyramidAnchorIterator
class. This ends up excluding all the examples within my first batch resulting inself.size
to be 0 https://github.com/jshtok/RepMet/blob/9bdc3f20ff08a8b3ce005af327aba6bf0bb71213/fpn/core/loader.py#L268-L309self.size
is now 0,self.cur_to
is also 0 resulting in a length 0 slice of theroidb
https://github.com/jshtok/RepMet/blob/9bdc3f20ff08a8b3ce005af327aba6bf0bb71213/fpn/core/loader.py#L424-L426roidb
, index 0 ends up being invalid because the list is empty, which results in the error raised in #9 https://github.com/jshtok/RepMet/blob/9bdc3f20ff08a8b3ce005af327aba6bf0bb71213/fpn/core/loader.py#L443-L444balance_classes
parameter seems redundant, so I have also tried setting this value toFalse
, which avoids the index issue...but other issues arise(I emailed @jshtok briefly about this a few weeks ago as I was seeing similar issues when trying to fine-tune on my own data. I did not resolve the issue and figured I'd try to get this up and running on ImageNet first and am running into similar issues).
Any ideas on how to fix this?