Learning from Data (Fall 2022)

kundtx commented 1 year ago

http://8.129.175.102/lfd2022fall-poster-session/29.html

Prof-Greatfellow commented 1 year ago

G1 Haizhou Liu: Good job in developing an intrusion detection system! I have a minor suggestion on enlarging the font size in Figs. 4 and 5. Also, have you considered using upsampling techniques in the data preprocessing step to address the data imbalance issue? This might help improve recall and f-1.

ccp123456 commented 1 year ago

G40 Chupeng Cui: Excellent project! I have noticed that it is difficult to identify the categories with a small number of samples, such as PortScan. Is there any further treatment for such samples?

Suikakon commented 1 year ago

G23 Zhang Boyang: Wonderful work! I want to know why to use DBN instead of CNN, which is more popular. Thanks in advance!

lylechan42 commented 1 year ago

@ccp123456 G40 Chupeng Cui: Excellent project! I have noticed that it is difficult to identify the categories with a small number of samples, such as PortScan. Is there any further treatment for such samples?

G29 Weizhi Chen: Thanks for the complement. We've noticed this issue and applied SMOTE( Synthetic Minority Over-sampling Technique). The f1-score for class brute force improved a bit and there's clearly much room for improvement. One possible way is to continue upsampling minority classes and downsampling benign and dos class. A more straightforward approach would be merging our dataset with others and even synthesize minority samples(https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8272217/).

ccp123456 commented 1 year ago

@lylechan42

@ccp123456 G40 Chupeng Cui: Excellent project! I have noticed that it is difficult to identify the categories with a small number of samples, such as PortScan. Is there any further treatment for such samples?

G29 Weizhi Chen: Thanks for the complement. We've noticed this issue and applied SMOTE( Synthetic Minority Over-sampling Technique). The f1-score for class brute force improved a bit and there's clearly much room for improvement. One possible way is to continue upsampling minority classes and downsampling benign and dos class. A more straightforward approach would be merging our dataset with others and even synthesize minority samples(https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8272217/). G40 Chupeng Cui: I see. Thank you very much.

lylechan42 commented 1 year ago

@Prof-Greatfellow G1 Haizhou Liu: Good job in developing an intrusion detection system! I have a minor suggestion on enlarging the font size in Figs. 4 and 5. Also, have you considered using upsampling techniques in the data preprocessing step to address the data imbalance issue? This might help improve recall and f-1.

G29 Weizhi Chen: Sincerely appreciate your advice. We have employed the SMOTE technique to synthesize minority class. Apparently we haven't synthesize enough. Also we can try downsampling major class appropriately. But the main issue is the dataset, I believe merging and synthesizing new datasets would help a lot.

lylechan42 commented 1 year ago

@Suikakon G23 Zhang Boyang: Wonderful work! I want to know why to use DBN instead of CNN, which is more popular. Thanks in advance!

G29 Weizhi Chen: Thanks! DBN is a generative and unsupervised method while CNN is discriminative and supervised. CNN is popular in terms of image classification tasks, while DBN works better when labeled examples are scarce since it's unsupervised. DBN can also learn more robust and generalizable features of the data, as they are not biased by the labels. One more thing worth mentioning about DBN is that DBNs are more efficient to train since they are trained layer by layer and don't need to learn the weights of all layers at once(CNN needs). Hope this answers your question:-)

min108 commented 1 year ago

G25 Citong Que: Excellent project! Do you consider deploying the model to the actual environment? Have you ever monitored the performance of the model online?

lylechan42 commented 1 year ago

@min108 G25 Citong Que: Excellent project! Do you consider deploying the model to the actual environment? Have you ever monitored the performance of the model online?

G29 Weizhi Chen: Thanks! As you see in comments above, the current model still need some improvement with regard to minority class attack. Once that's done, the model can be packaged and deployed with Docker.

kundtx / lfd2022-comments

Learning from Data (Fall 2022) #38