JOSS Paper review by Nick

Here are my main suggestions to improve the PyForestScan software paper for JOSS submission.

Summary Section

Clarify Acronym: Before mentioning LiDAR as an acronym, spell it out in full as LIght Detection And Ranging, similar to how you've done for FHD, PAD, and PAI.
Digital Terrain Models: Add the term “models” after "digital terrain."

State of the field

Missing: This section is completely missing from the paper. Here a list of previous work or alternative software should be presented in a way to then reinforce the need for PyForestScan.

Statement of Need

Data Complexity and Size:
- Complexity: Please specify the complexities that current methods struggle to address and how PyForestScan aims to solve these issues.
- Size: Could you provide more specific details when discussing size? Typically, LiDAR data is processed in tiles. For example, 1km x 1km LiDAR tiles with low average point densities (2-3 pts/m²) are usually around 20-30 MB, which is manageable. However, with higher densities (e.g., UAV LiDAR data at 2000-3000 pts/m²), file sizes can become a limiting factor. Please clarify the specific size challenges.
Last Sentence of First Section - LiDAR-based Analysis:
- While “LiDAR-based analysis” is generally understood, it's more precise here to refer to "point-cloud-based analysis" rather than LiDAR-based analysis. LiDAR is the sensor technique that produces point clouds, often discretized into rasters. Since this section discusses LiDAR-derived point-cloud analysis (as mentioned earlier), consider using "point-cloud-based analysis."
Second Section - LiDAR Data:
- See the previous point. It seems you are specifically referring to point-cloud data here.
Points Generated from SfM:
- How are PAD, PAI, and FHD metrics derived from airborne SfM data, given that aerial SfM typically captures only the canopy top without penetrating vegetation layers? Without integrating terrestrial images or using off-nadir, low-altitude images, SfM can only capture the top layer, making voxel creation impossible. Of course, SfM CAN generate 3D point clouds of trees, but large-scale forest point clouds (with tiepoints derived from several heights along the vegetation profile) from aerial SfM seem not feasible or feasible only with expensive drone swarms flying through the canopy. Please clarify how SfM data can be used in your package to derive these metrics.

Acknowledgment

Test dataset to validate and benchmark PyForestScan:
- It is mentioned that a single noise-free classified point cloud has been used for testing and validating the package. It would be valuable to know more about the data used for testing in this paper, without having to find the information in the papers cited in the test_data folder. Please add some information in this paper, such as (1) location, (2) dominant ecosystem/vegetation type/vegetation structure, (3) point cloud density and (4) preprocessing steps if any.

@npucino Thank you very much for your very detailed review of the paper. I have implemented the suggestions you made, including some requested by the other reviewer, Dr. Wu. These changes were made to better organize the description of the software. This was done to address the comments related to the issues of the missing “state of the field” and the concern for the lack of substantial scholarly effort. These included moving the introduction of existing software to be earlier in the text, and expanding on this to provide a more detailed picture of the state of the field. These also included reorganizations in the text to enhance emphasis on the functions that this library introduces to calculate forest structural metrics in Python, something that prior to this library did not exist, and reducing the emphasis on the contribution of the library in solving “big data” and efficiency problems, as you are correct to point out that I am leveraging pdal to do this. Many of the required changes warranted more explanation and so the revision of the manuscript is considerably longer. Finally, I think that it is important to say that this is the first stable release of the library. I hope and plan to continue to add additional functionality, like the ability to calculate canopy cover, incorporate tree segmentation, and work with waveform lidar – for example modeling point cloud distributions from waveforms. Thank you, again!

Here are my responses to each of your suggestions. We have tried to incorporate these all into the manuscript and repository.

Summary Section Clarify Acronym: Before mentioning LiDAR as an acronym, spell it out in full as LIght Detection And Ranging, similar to how you've done for FHD, PAD, and PAI. Digital Terrain Models: Add the term “models” after "digital terrain."

Clarifying the acronyms: "lidar" is spelled out now in full. We also made the spelling and format of "lidar" to be consistent through out the text ("lidar" as opposed to "LiDAR"). Digital Terrain Models: any typo that was here should now be fixed.

State of the field Missing: This section is completely missing from the paper. Here a list of previous work or alternative software should be presented in a way to then reinforce the need for PyForestScan.

We did not add a separate heading for the paragraph that details the current state of the field, as this is not required by JOSS and we felt that it might detract a little from the organization of the paper. However, we did reorganize the text so that the discussion of existing packages (and the lack of) are mentioned much earlier in the text and is now contained within a single paragraph. This includes a list of previous work and alternative software, and the lack of published libraries in Python to calculate forest sturcutral metrics.

Statement of Need Data Complexity and Size:

Complexity: Please specify the complexities that current methods struggle to address and how PyForestScan aims to solve these issues. Size: Could you provide more specific details when discussing size? Typically, LiDAR data is processed in tiles. For example, 1km x 1km LiDAR tiles with low average point densities (2-3 pts/m²) are usually around 20-30 MB, which is manageable. However, with higher densities (e.g., UAV LiDAR data at 2000-3000 pts/m²), file sizes can become a limiting factor. Please clarify the specific size challenges.

We made major revisions to the manuscript here to try to clarify this point. The major complexities are related to size and we have added text to show how our library makes use of the power IO capabilities of PDAL to read data formats like EPT and COPC, which are more efficient and designed towards handling huge datasets. We also try to clarify the types of operations needed, and how our library makes use of existing tools to calculate forest metrics. Our major contribution is code to calculate forest structural metrics, and providing a way to do this efficiently with large datasets.

Last Sentence of First Section - LiDAR-based Analysis:

While “LiDAR-based analysis” is generally understood, it's more precise here to refer to "point-cloud-based analysis" rather than LiDAR-based analysis. LiDAR is the sensor technique that produces point clouds, often discretized into rasters. Since this section discusses LiDAR-derived point-cloud analysis (as mentioned earlier), consider using "point-cloud-based analysis."

Thank you for this suggestion. We have made the change to use "point-cloud-based analysis" throughout the text.

Second Section - LiDAR Data:

See the previous point. It seems you are specifically referring to point-cloud data here.

We have made the change to use "point-cloud-based analysis" throughout the text.

Points Generated from SfM:

How are PAD, PAI, and FHD metrics derived from airborne SfM data, given that aerial SfM typically captures only the canopy top without penetrating vegetation layers? Without integrating terrestrial images or using off-nadir, low-altitude images, SfM can only capture the top layer, making voxel creation impossible. Of course, SfM CAN generate 3D point clouds of trees, but large-scale forest point clouds (with tiepoints derived from several heights along the vegetation profile) from aerial SfM seem not feasible or feasible only with expensive drone swarms flying through the canopy. Please clarify how SfM data can be used in your package to derive these metrics.

This is an important point, and we tried to make this clear in the text that using PyForestScan to calculate metrics in open canopy forest with very dense SfM data can create a point cloud that is detailed enough to calculate these metrics. There are no specific functions (as of yet!) that have been developed for SfM data. That said, we felt it was important to ensure that readers are aware that this is a package primarily for point clouds from airborne sensors, and so however those are generated, so long as the data is captures enough of the forest, the library can calculate the structural metrics.

Acknowledgment Test dataset to validate and benchmark PyForestScan: It is mentioned that a single noise-free classified point cloud has been used for testing and validating the package. It would be valuable to know more about the data used for testing in this paper, without having to find the information in the papers cited in the test_data folder. Please add some information in this paper, such as (1) location, (2) dominant ecosystem/vegetation type/vegetation structure, (3) point cloud density and (4) preprocessing steps if any.

We included a small usage section in the text to (1) point readers towards jupyter example notebooks that can be used to help fascilitate usage of the library, (2) the github repository, where all the source code can be found, along with the example data, and (3) basic information on both the point cloud -- including point cloud density (via nominal pulse spacing) and preprocessing steps, as well as information on the forest cover type.

Thank you again for your detailed response. I hope that these address your concerns.

iosefa / PyForestScan