Open tfvieira opened 4 months ago
Manuscript submission: Jul 05, 2024 Notification of acceptance: Sep 02, 2024 Camera-ready: Oct 04, 2024 Symposium: Nov 26-29, 2024
Temos dois possíveis trabalhos para a conferência:
[x] Averiguar a possibilidade de embarcar os modelos em uma ESPCAM-S3 que possuo.
[x] Utilizar Eloquent TinyML para tal juntamente com os drivers da câmera da ESP.
Caso não seja possível em tempo hábil, utilizar o Arduino Nano.
[x] Averiguar etapas para realizar o embarque no Arduino.
[x] Utilizar o Edge Impulse para isso?
Após obtermos os resultados, atacar os pontos para a escrita do artigo citados na criação desta issue.
Quanto ao TWM V2:
Quanto ao seu trabalho, basicamente consistiria nos relatos:
E foca:
Paper TWM-V2 foi submetido ao SBESC no dia 11/07/2024. Paper de TinyML ainda aguarda uma última revisão para submissão hoje. TWM-V1 também foi submetido.
Papers foram submetidos e aceitos, paper do TWM-V2 foi revisado e incluiu algumas observações dos autores. Paper do TinyPPE necessita de revisão, irei detalhar abaixo os passos e comentários do revisor.
Revisor 1
Revisor 2:
Revisor 3:
[x] At the beginning of the introduction, important data related to workplace safety in the United States and Russia are presented, but these numbers may differ in other countries, such as Brazil. It is important to contextualize this work, as using such numbers as motivation needs to be adapted to the location of application. At the beginning of the second paragraph, an important point is highlighted, namely which area will be considered. In addition to the country, it is important to know which profession is within the scope of this work, as the risks vary for each occupation.
[x] In the third paragraph of Section I, the statement regarding the computational requirements to run machine learning algorithms is incorrect, even being contradicted by the text itself when it defines TinyML.
[x] Another inconsistent statement is that cameras could not process images (likely related to ML algorithms) and therefore need to send data to a server. However, the very proposal of the work addresses this issue, making the arguments contradictory. Reference 5 (on TinyML) seems inappropriate for the fundamental challenges of embedded systems, regardless of the presence of embedded ML.
[x] In the fifth paragraph of Section I, the objective of the work is presented as detecting the improper use of safety helmets. However, in the following sentence, it is stated that the dataset must contain images of people with and without helmets, which makes the objective inconsistent, since the model apparently needs to detect whether people are wearing helmets or not, which is different from detecting improper use.
[x] In the second-to-last paragraph of Section I, the Edge Impulse platform is mentioned for the first time, and its reference should have been cited.
[x] In the last paragraph of Section II, after listing other works that proposed solutions for detecting helmet use and TinyML implementations, the motivation for this work is presented. These motivations include high accuracy in detection, with low latency and low energy consumption, using on-device processing without network connection. Why are these requirements important for the problem of detecting safety helmets? Why does this solution NEED to be implemented on an embedded system? Knowing that an embedded system, which has various memory and processing power constraints, might benefit more from robust hardware for more accurate detection and network connectivity, as this could bring several advantages, such as notifying the safety department, sending alerts, or communicating with other devices, like doors granting access to dangerous environments that require safety equipment. Therefore, the motivations do not seem to make sense for this problem.
[x] In Section III-A, there is again an inconsistency in the objectives. It seems to me that the model will detect whether the person is wearing a helmet or not, rather than detecting if it is being worn correctly, as stated in the text.
[x] Table I could presents the information on the number of training and validation images separately.
[x] In Section IV-C, the terms related to the design decisions of the neural network model are presented; however, the precise values (such as the decisions made) are not provided, for example, the number of epochs, learning rate, etc. This paragraph also confirms that the model only detects whether the person is wearing the helmet (on their head) or not. Still in this section, it is mentioned that the early stopping technique was used; however, the value of the 'patience' parameter for this technique was not provided.
[x] In Section IV-D, the characteristics of the hardware used are very well presented, including both the microcontroller and the camera. Both have very restricted characteristics, common in embedded systems environments. This reinforces my argument that for this specific problem (safety), there does not seem to be a fundamental requirement for using such restricted hardware; on the contrary. For example, the camera has very low resolution and a low frame rate, so it is possible that in a real-world environment, the camera would need to be very close to the evaluated person, with a latency time that could become prohibitive if there are many people to assess.
[x] The first part of Section V-A, which details how the model was trained, should not be in the results section but rather in the methodology section.
[x] "The results obtained from the metrics (accuracy and F1-score) and the confusion matrix demonstrate that the model classifies very well, including with the quantized version, for the dataset it was trained/tested on. Similarly, the inference time is low; however, the inference time will need to be added to the image acquisition and preprocessing times to determine the actual latency.
[x] I missed a more in-depth analysis of accuracy and latency in a real embedded environment. For example, what are the limitations of the deployed model? At what distance should the camera be from the user? How long does it take from capture to notification about whether or not the person is wearing a helmet? How does the model perform in environments completely different from the dataset? Additionally, a discussion on the advantages of integrating such a system (even if embedded) with the network would be beneficial.
Overview
- [Overleaf link](https://www.overleaf.com/3988131517tsprshctwxwx#7e96f2).
Venue
- [Brazilian Symposium on Computing Systems Engineering (SBESC)](https://sbesc.lisha.ufsc.br/sbesc2024/Call+for+Papers).
TODO
- [x] Resumo. - [x] Abstract. - [x] Introduction. - [x] Motivation Example. - [x] Methodology (Mat. & Methods). - [x] Results. - [x] Discussion. - [x] Related Works. - [x] Risks. - [x] Conclusion. - [x] Future Works.