Official reviews from ML4H2022

Official Review of Paper136 by Reviewer y4bt

ML4H 2022 Symposium Paper136 Reviewer y4bt 23 Sept 2022ML4H 2022 Symposium Paper136 Official ReviewReaders: Program Chairs, Paper136 Area Chairs, Paper136 Reviewers Submitted, Paper136 Authors

Summary: This paper describes the use of CNNs to classify a specific echocardial view (apical four chambers view) from videos and trains small CNNs to perform this task. Relevance To Healthcare: 4: High Presentation: 1: Very low Significance: 2: Negative Technical Sophistication: The use of small CNNs seems appropriate, especially given the the framing of the task as relevant to low/middle income countries where hardware limitations may be more stringent.

Strengths: The paper addresses a potentially important task and with a potentially suitable solution. The task of classifying echo views is an important clinical task when considering that these echos may be performed by practitioners with a wide variability in skills as the authors discuss. The dataset collected is of potential high value to the community.

Weaknesses: Major weaknesses: The authors describe a task which is well studied in the literature (as described in appendix A). It is unclear what is the novelty of this study, and a comparison to other methods (e.g. on the original dataset) would have been helpful in establishing that thin-networks are more suitable for a resource constrained environment. The description of the dataset is lacking - is the task only binary classification (A4C vs not)? if so are these only apical views?. It is unclear if the dataset contains patients with visible echo anomalies, which sepsis and dengue do have cardiac manifestations it is unclear what was observed in the echo. The authors frame their study as specifically focused on low-/middle-income countries, other than the dataset which was supposedly acquired in such a location,but I’m not sure I understand what else in this study is specific to this scenario. The latency showed is quite high (as the authors mentioned themselves), but more importantly isn’t tested against non-thin networks (e.g. a resnet or a transformer variant) and not tested on a constrained hardware (e.g. CPU). To clarify, the collection of the dataset itself from such a source is by itself beneficial to the community in my opinion. The scoping review section’s relevance to the study is mostly quite weak. It would have been beneficial to thin it out and move appendix A which discusses similar works to this section.

Minor: The figures are very small and hard to read. Consider separating the final model results and the loss curves. Also consider aggregating final model results across runs as e.g. swarm / box / bar and error plot. The text is often hard to read due to grammatical issues, some textual editing is encouraged. The experiments shown in figure 2 are unclear, I guess these are different ways to train the network? It seems like figure 4 is more representative of the main results.

Recommendation: 1: Reject: Work suffers from one or more of the following: Off-topic content (no ML or no healthcare application), Severe technical flaws, No novel contribution (results are known or trivial), Extreme formatting violations Confidence: 3: Fairly Confident: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. Track Switching: N/A. Already an extended abstract.

Official Review of Paper136 by Reviewer 1TAW

ML4H 2022 Symposium Paper136 Reviewer 1TAW 20 Sept 2022 (modified: 20 Sept 2022)ML4H 2022 Symposium Paper136 Official ReviewReaders: Program Chairs, Paper136 Area Chairs, Paper136 Reviewers Submitted, Paper136 Authors

Summary: This work focuses on the application of machine learning to 2D echocardiography videos. The paper provides a scoping review of the ML in echocardiography applications, mainly focusing on the ICUs in low- and middle-income countries. The authors experiment with thinner neural network architectures on a dataset of 31 ICU patient echocardiography videos. Relevance To Healthcare: 4: High Presentation: 3: Medium Significance: 3: Neutral Technical Sophistication: N/A

Strengths: The scoping review can be a helpful introduction to ML for echocardiography for the readers entering the field. The problems of model deployment on low-cost hardware and data scarcity are highly pertinent to biomedical ML. This paper raises these issues by considering thin architectures, such as MobileNet and SqueezeNet, on a small dataset of echocardiography videos. Weaknesses: It is not necessarily clear what is the central insight or takeaway of this paper for the community. Somehow, the scoping review and the experiments are not structurally or conceptually well-connected. The experiment with thinner neural networks and US image classification lacks comprehensive description and interpretation. What is the target variable that is being predicted? What conclusions can be made from the results in Figure 2? Are thinner architectures better than standard backbones, such as VGG or ResNet? The figures are too small and dense, especially, Figure 2. They should be simplified and described in more detail within the main text. Miscellaneous: It is unclear how the case study relates to LMICs mentioned in the title. Was the sample acquired in an LMIC? The writing could be improved, and the manuscript should be proofread. Some examples of errors are "We then performed heuristics for each model", "Nhat et al. (2021) presented a deep-learning pipelines", "Ghorbani et al. (2020) reported how deep learning models predicts systematic phenotypes". Abbreviations should be avoided in the abstract for better readability, especially considering a broader audience. The motivation could be improved by explaining why thinner DNNs are better suited for low-cost hardware. Recommendation: 2: Marginal Reject: I tend to vote for rejecting this submission, but accepting it would not be that bad. Confidence: 3: Fairly Confident: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. Track Switching: N/A. Already an extended abstract.

Official Review of Paper136 by Reviewer 9KCF

ML4H 2022 Symposium Paper136 Reviewer 9KCF 20 Sept 2022ML4H 2022 Symposium Paper136 Official ReviewReaders: Program Chairs, Paper136 Area Chairs, Paper136 Reviewers Submitted, Paper136 Authors

Summary: In this extended abstract, the authors describe and discuss challenges of a clinical use case for an AI-empowered echocardiography system that would enable real-time classification of the apical four-chamber view (A4CV), to be used with intensive care unit (ICU) patients in low- and middle-income countries (LMIC). They introduce important challenges within the development and give a scoping review on related work. They use a dataset of 31 ICU patients’ echocardiography videos and present important steps for the development of such system including data preparation, curation, labelling and model selection, validation and deployment of three thinner neural networks. Further, they analyze the influence of different training settings on the results for one thinner neural network. They present this use case to open the discussion on challenges in AI-empowered echocardiography system development. Relevance To Healthcare: 5: Very High Presentation: 2: Low Significance: 4: Positive Technical Sophistication: The authors use previously applied thinner neural network models (MobileNet V1 and V2, SqueezeNet) and analyze different general training settings (dataset size, batch size of clips, number of frames per segment) on the model performance in classical machine learning experiments. They provide details on the procedure and present figures with results. As far as I see, there is no special architecture development, training scheme or model evaluation.

Strengths: This is an interesting use-case and transferable to other important areas in the medical domain. The authors explain the relevance nicely and provide most important background and domain information. The work is well-structured beginning with describing the main challenges and referring to related work in this area. They underline their considerations with a small, structured case study and open the discussion on challenges for the development and clinical translation of a machine learning based echocardiography support system. They describe all steps beginning from the data set generation (incl. data set properties), data annotation and model evaluation in a comprehensive way. Further, adding information to the Appendix and promising to provide the code, data and resources to reproduce this work upon acceptance of the work reflects best practices.

Weaknesses: Introduction: The introduction seems to be more targeted to the general case of AI-empowered echocardiography support systems. The challenges that are specific for LMICs should be highlighted separately from the general challenges of such a system (“intra-view variability …” is not specific for LMIC). In general, the special case of LMICs should be made clearer throughout the whole work. This could open more discussion points. Otherwise, the clinical translation in general would be the focus. The authors might consider to choose between a general clinical translation use case or targeting it more to LMICs. This should then also be reflected in the description and discussion of methodological issues (e.g., special challenges in LMICs to gather and annotate high-quality representative datasets, availability of computational resources). It’s not clear what the aim of the AI-empowered system the authors want to develop and apply is. As I understood this based on Figure 1, the system is trained to differentiate A4C from background. Is this true? Or is it differentiating A4C from other views? Thus, the clinical application would be that medical professionals get advised by the system once they’ve “found” the A4C or do they get a segmented A4C view? This should be made clear from the beginning as there are other systems that aim to classify abnormal from normal images or segmenting tissue and background during the whole echocardiography. Scoping Review: The subsection on AI-empowered echocardiography for ICU in LMICs starts quite broadly. The authors describe general machine learning based ICU support systems. This section could be more focused on the actual use case of this study (echocardiography for ICU in LMICs). On the other hand, information on classification of echocardiograms is moved to the appendix. I would see this part more important for the main work. Model Selection and Heuristics: Which “heuristics” were used? Was it just trying out different settings or are there any specific reasons for comparing those settings (e.g., 1 vs. 30 frames per segment). Where exactly are the comparisons of different data set sizes? Is this represented in Figure 4 in the appendix? Important information on the training routine (split) and model evaluation can only be found in the Appendix in the caption of Figure 4. This should be integrated in the paper. Figure 2: It’s not well readable. Even when zooming in, the legends of Figure 2a are overlapped. The datatype in the legend referring to a black line is a bit confusing. It could be left out as this is integrated in the general figure legend. Elapse times are not very informative. What are they meant to reflect? And what is the elapse time for notebook referring to? Figure 2: Titles of the subplots and the caption of the figures are not matching. In Fig. 2a right side, it says N_BatchClips = bc20 where the captions says 30 for BATCH_SIZE_OF_CLIPS. In Fig. 2b right side, it says N_BatchClips = bc60 where the caption says BATCH_SIZE_OF_CLIPS = 30. Is this a mistake? Results are not well presented nor discussed. What is the main outcome of the experiments? Is there an important influence regarding the settings evaluated? How well does the system perform in general? Is it comparable to other systems? This should be presented in a more comprehensive way and should also be discussed in the light of the research questions presented in the beginning. Conclusion and Discussion: While the selection and the introduction of this work’s topic are very much inviting to begin a discussion on challenges in translating machine learning based support systems to clinical adaptation (in LMICs), this part comes rather short. There is a lot of place for discussing best practices, future work and implications of your results based on the case study. Miscellaneous: Abbreviations should be introduced when first used (LMIC, 4CV, ICU in abstract)

Figures are readable when extremely zooming in. However, it would be better to have better readable figures. Consider shortening the text instead or removing (parts of) figures.

When describing the challenges in the introduction, there is more than one challenge per numbered challenge (“…and …”). It should be considered to separate different challenges to different numbers.

Recommendation:

In general, opening the discussion on the clinical translation of an AI-empowered echocardiography system for ICU patients (in LMICs) is an interesting use case. Building up this discussion on a small use case is valuable and applying first baseline methods seems reasonable for the scope of an extended abstract. However, the work could be much more leading the discussion and should definitely integrate its results when already performing experiments. Otherwise, this does not contribute much to the topic. In the current form, I would have to reject this work. However, if substantial revisions regarding the presentation and discussion of results and the discussion results, especially with respect to the challenges and future implications for this interesting use case, would be done, I could consider accepting this work in the light of a potential discussion-opener during the conference. Recommendation: 3: Marginal Accept: I tend to vote for accepting this submission, but rejecting it would not be that bad. Confidence: 3: Fairly Confident: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. Track Switching: N/A. Already an extended abstract.

Official Review of Paper136 by Reviewer MD1W

ML4H 2022 Symposium Paper136 Reviewer MD1W 08 Sept 2022 (modified: 11 Sept 2022)ML4H 2022 Symposium Paper136 Official ReviewReaders: Program Chairs, Paper136 Area Chairs, Paper136 Reviewers Submitted, Paper136 Authors

Summary: This work focuses on a light-weight echocardiography system that classifies the views of 2D US videos collected from ICU patients. They provide an overview of their pipeline with the low-cost clinical system and classification with shallow neural nets. Relevance To Healthcare: 3: Medium Presentation: 2: Low Significance: 1: Very negative Technical Sophistication: The motivation is well presented. However, the method itself is not sophisticated.

Strengths: The motivation for having a leight-weight classifier for LMIC countries is a valid point.

Weaknesses: My biggest concern is novelty, since the proposed work is not really novel. There are quite many works on echocardiography view classification with leight-weight models. See below for some examples (list is not exhaustive): Zian, L. Q., Kumar, Y. J., Sing, G. O., & Ye, Z. (2020). Classification of Echocardiogram Views using Deep Learning Models. Journal of Advanced Computing Technology and Application (JACTA), 2(2), 33-40. Hooman Vaseli, Zhibin Liao, Amir H. Abdi, Hany Girgis, Delaram Behnami, Christina Luong, Fatemeh Taheri Dezaki, Neeraj Dhungel, Robert Rohling, Ken Gin, Purang Abolmaesumi, Teresa Tsang, "Designing lightweight deep learning models for echocardiography view classification," Proc. SPIE 10951, Medical Imaging 2019: Image-Guided Procedures, Robotic Interventions, and Modeling, 109510F (8 March 2019); https://doi.org/10.1117/12.2512913 Madani, A., Arnaout, R., Mofrad, M. et al. Fast and accurate view classification of echocardiograms using deep learning. npj Digital Med 1, 6 (2018). https://doi.org/10.1038/s41746-017-0013-1 Almost no details about the model training. For example: learning rate, loss term, number of epochs, batch size for trainin, etc are missing. There are many more views used for echocardiography. Having a classifier for only four-chamber view is not a realistic scenario. Miscellaneous: The presentation of the paper is below average: the figures are too small to see and understand anything especially Fig. 2. Recommendation: 1: Reject: Work suffers from one or more of the following: Off-topic content (no ML or no healthcare application), Severe technical flaws, No novel contribution (results are known or trivial), Extreme formatting violations Confidence: 5: Absolutely Certain: You are absolutely certain about your assessment. You are very familiar with the related work. Track Switching: N/A. Already an extended abstract.

vital-ultrasound / preprint2023