Open dominikabasaj opened 6 years ago
Thanks for bringing this up @dominikabasaj. This is definitely on the radar and we will be adding support for Streaming. I will encourage you to wear a PM hat and help us define the requirements/use cases/etc around this feature. This will help us validate what we are thinking and makes sure you get what you are looking for in this feature. CC: @itsvikramagr
@dominikabasaj
Here is one way to get it working with streaming job. I haven't tried it with streaming yet. Let me know if this serves your purpose.
1.Start your application with --packages qubole:sparklens:0.1.2-s_2.11
but don't specify the extraListener config.
import com.qubole.sparklens.QuboleNotebookListener
val QNL = new QuboleNotebookListener(sc.getConf)
sc.addSparkListener(QNL)
Basically, create a listener(note that this is Notebook listener and not JobListener) and register it.
QNL.profileIt {
//Your code here
}
Alternatively, if you need more control:
if (QNL.estimateSize() > QNL.getMaxDataSize()) {
QNL.purgeJobsAndStages()
}
val startTime = System.currentTimeInMillis
<-- Your scala code here -->
endTime = System.currentTimeInMillis
//wait for some time to get all events to accumulate
Thread.sleep(QNL.getWaiTimeInSeconds())
println(QNL.getStats(startTime, endTime))
thanks!
Sorry for duplicating, but this issue is also related to streaming, so just thought of updating.
We have tried using QuboleJobListener for structured streaming , but it will only provide reports after terminating the streaming query and also it provides for all the Jobs together (not batch wise)
But in general, as these Structured streaming applications are continuously running, users/developers will be interested to see stats for every few batches.
Detailed proposal is attached as below. Please review and provide your inputs.
@dominikabasaj @akumarb2010 You can check out our new project Streaminglens if you plan to use Sparklens for Streaming applications.
Hi,
Are there any plans to adjust Sparklens for streaming processing? I assume that right now it is suitable only for batch processes?
Best, Dominika