the loss is so big - Githubissues

ZuoTisheng201 commented 4 months ago

For training on the RADDet dataset, I downloaded the dataset provided by the authors. Where the ADC data is unzipped in two folders, a folder with the name ADC size 20.1G, and a folder with the name __MACOSX which contains a 1kb _ADC file, and a 2Mb file with the name ADC. My choice would be to just use the 20.1G ADC file as the ADC data you requested in RADDet Usage. Then the gt_box and gt_box_test in RADDet Usage were used from the gt in the train directory and the gt in the test directory in the RADDet dataset. following this approach I successfully ran your code, but the result was a very high loss, with more than 10,000losses for the first epoch, which I thought was still a dataset I think it is the dataset that is causing the problem, is it the placement of my dataset that is wrong?

jgiroux8 commented 4 months ago

The placement of the dataset should not be an issue, however I am wondering if there is an issue with the data itself. As you have mentioned, you are using the raw ADC data which should require no normalization as the network will use ActNorm layers. One thing to check is that the correct data mode (network) is being used. It could also be true that there is misalignment with the labels. I know the authors of the RADDet dataset have moved the location of their data. Perhaps something was lost in translation there. I would recommend doing some sanity checks between the labels for the RAD cube (or their RD) data, and compare to your own RAD cube or RD transformation to make sure the spectra look the same. If everything looks ok there, let me know and we can dig into this further.

ZuoTisheng201 commented 4 months ago

The placement of the dataset should not be an issue, however I am wondering if there is an issue with the data itself. As you have mentioned, you are using the raw ADC data which should require no normalization as the network will use ActNorm layers. One thing to check is that the correct data mode (network) is being used. It could also be true that there is misalignment with the labels. I know the authors of the RADDet dataset have moved the location of their data. Perhaps something was lost in translation there. I would recommend doing some sanity checks between the labels for the RAD cube (or their RD) data, and compare to your own RAD cube or RD transformation to make sure the spectra look the same. If everything looks ok there, let me know and we can dig into this further.

Thanks for the reply, I've been a bit busy lately and missed it. Yes, the author of the dataset had a problem with the data on google, the tags in the gt folder in the re-uploaded link were missing, and I'm modifying the code to consider using only the corresponding ADC data that contains the retained portion of the gt folder. Of course if you still have the data from the gt folder under the full RADDet dataset left, could you post it, thank you very, very much!

jgiroux8 commented 4 months ago

I will not be able to check if I still have that data until August unfortunately. In the case that I do still have it, I will contact the creators of that dataset and figure out the best way to share this data.

ZuoTisheng201 commented 4 months ago

I will not be able to check if I still have that data until August unfortunately. In the case that I do still have it, I will contact the creators of that dataset and figure out the best way to share this data.

Ok thanks a lot. So far I've found the serial numbers of the missing data in the gt_box and then deleted the corresponding parts of the ADC, only a hundred or so have been deleted so it's not a big impact and the result is OK.

jgiroux8 commented 3 months ago

Unfortunately, I do not have the original data, so will not be able to provide this. Were you able to get things working with the data you had available after dropping some timestamps? If so, let me know and I will mark this issue as closed.

IqbalBan commented 1 month ago

if self.mode == 'Train':
            self.FFT_label_idx = list(set(self.FFT_label_idx) - set(self.test_indices))
            print(len(self.FFT_label_idx))
            for filenum in self.FFT_label_idx:
                gt_name = os.path.join(self.root_dir,'gt_box','{:06d}.pickle'.format(filenum))
                if not os.path.isfile(gt_name):
                    self.FFT_label_idx = list(set(self.FFT_label_idx) - set([filenum]))
            print(len(self.FFT_label_idx))

This is the code I used to deal with the missing data if anybody needs. Also, the loss was also very large (16k) for the first epoch, but it continued to go down, and was normal (between 10-20) by the time it finished training for me.

jgiroux8 / T_FFTRadNet

the loss is so big #2