Open petebankhead opened 2 years ago
Using OS-2.ndpi with ~150k cells, the following script requires 12-15 seconds on a Mac Studio:
import qupath.lib.gui.tools.MeasurementExporter
import qupath.lib.objects.PathCellObject
def project = getProject()
def imagesToExport = [getProjectEntry()]
def separator = "\t"
def columnsToInclude = new String[]{"Name", "Class", "Nucleus: Area"}
def exportType = PathCellObject.class
def outputPath = buildFilePath(PROJECT_BASE_DIR, getProjectEntry().getImageName() + ".tsv")
def outputFile = new File(outputPath)
def exporter = new MeasurementExporter()
.imageList(imagesToExport) // Images from which measurements will be exported
.separator(separator) // Character that separates values
.includeOnlyColumns(columnsToInclude) // Columns are case-sensitive
.exportType(exportType) // Type of objects to export
.exportMeasurements(outputFile) // Start the export process
print "Done!"
By contrast, the following exports something similar but takes 0.6-0.7 seconds:
// Some kind of file path for the current image
def name = getProjectEntry().getImageName()
name = GeneralTools.getNameWithoutExtension(name)
def path = buildFilePath(PROJECT_BASE_DIR, name + '.tsv')
def cells = getCellObjects()
def measurements = ['Nucleus: Area']
try (def writer = new PrintWriter(path)) {
// Write header
def sb = new StringBuilder()
sb.append('Class')
for (def measurementName in measurements) {
sb.append('\t')
sb.append(measurementName)
}
writer.println(sb.toString())
// Write measurements
for (def cell in cells) {
sb.setLength(0)
sb.append(cell.getPathClass())
for (def measurementName in measurements) {
sb.append('\t')
sb.append(cell.getMeasurementList().getMeasurementValue(measurementName))
}
writer.println(sb.toString())
}
}
println "Written to $path"
Some overhead is expected when using MeasurementExporter
, but it should be reduced.
So the lack of a buffered stream is probably unimportant, since digging down deeper I see that a PrintWriter
is used... which involves some buffering (as far as I can tell). Which may explain why I didn't really spot any clear improvement when using a BufferedOutputStream
.
Upon further investigation, it's probably worth revising this command. The following methods do much the same thing:
For maintainability, we should try to figure out a way to reuse the same code.
Bug report
Describe the bug The performance of
MeasurementExporter
is unacceptably slow when large numbers of objects and measurements.(Although, as we shall see, it's not entirely its fault...)
To Reproduce Steps to reproduce the behavior:
Expected behavior Exporting hundreds of thousands of measurements takes a matter of seconds.
Desktop (please complete the following information):
Additional context The discussion behind this is at https://forum.image.sc/t/qupath-extremely-slow-exporting-detection-measurements/71154
Investigating revealed a few issues:
MeasurementExporter
might not be using a buffered output stream (although how much this matters if unclear)GeneralTools.formatNumber
show up on VisualVM as a bottleneckThe first is easy to address, although may not help much.
The second can also be addressed by excluding columns earlier. The third may be tricker, but is needed to help in cases where a full table should be export.