huawei-noah / streamDM

Stream Data Mining Library for Spark Streaming
http://streamdm.noahlab.com.hk/
Apache License 2.0
492 stars 147 forks source link

ONCEStreaming #90

Closed pengsz1993 closed 3 years ago

pengsz1993 commented 6 years ago

ONCE based on spark streaming

Summary of the changes

我们希望将ONCE算法加入到streamdm中,该算法可以从动态的信号流中,统计出带有时间约束序列片段出现的次数,准确率达到了100%,具体算法描述可以参考我们与华为进行的合作项目中ONCE算法的相关论文和专利描述。

我们在streamDM-master/src/main/scala/org/apache/spark/streamdm中添加了ONCEStreaming文件夹,其中有两个Scala文件用来描述算法。一个文件保存测试数据。 Sender.scala 这个文件的作用是生成测试数据,数据是以数据对的形式出现的,我们在测试数据中使用(0到9随机数)来模拟信号的生成,每一个信号和当时出现的时间组成一个数据对,我们将每50个数据对组成一个list,封装到RDD中通过socket向接收端发送,发送间隔是1秒。(当前我们使用的是数字作为信号,后期可以进行修改) 接收并处理的文件是我们上传的另一个文件,ONCEStreaming.scala,当运行Sender.scala 之后,启动spark集群后运行ONCEStreaming.scala,我们当前程序中进行挖掘的是序列(8,8),时间约束是十秒,spark每5秒对传输来的数据进行处理,写到了程序中,当信号的格式确定之后我们也同样会作出修改,作为参数进行输入,程序的输出就是随着信号流的产生而统计的待计数信号片段出现的频率。

Tests

测试结果在test_result.txt中

zhangjiajin commented 6 years ago

你好,我们对ONCE算法比较感兴趣。能否提供论文《ONCE: Counting the Frequency of Time-constrained Serial Episodes in a Streaming Sequence》的下载链接?或者直接附在该issue中。你们发明这个算法的场景是怎样的?解决了哪些实际的问题?

CLAassistant commented 3 years ago

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
0 out of 2 committers have signed the CLA.

:x: 彭思哲
:x: pengsz1993


彭思哲 seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.