webmachinelearning / proposals

🚀 Proposals for future work
5 stars 2 forks source link

Data processing proposal #1

Open WenheLI opened 3 years ago

WenheLI commented 3 years ago

Proposal name

Data Processing API for Web

Short description

The needs are mainly addressed by the fact that the deep learning models can not work independently and the data process is needed for both the inputs and outputs of one model.

Since we are drafting the web-dl spec, we should also pay attention to a standard data process spec. Furthermore, the data process should be compatible with js syntax.

Example use cases

const [trainData, testData] = rawImgDatas.map(it => it.resize([224, 224]).blur()).shuffle().splitTrainTest();

const tablarData = rawTablarData.head(10).shuffle();

A rough idea or two about implementation

We are currently working on datacook to implement some data-related processing methods based on tfjs & danfo. And we finish the API level design here and re-implement some methods natively within browsers.

anssiko commented 3 years ago

Thanks @WenheLI for this proposal for possible future work.

Let me loop in the Web Neural Network API editors @huningxin and @wchao1115 for their initial feedback.

I'm happy to bring this proposal for discussion at one of our bi-weekly teleconferences. Let me know.

WenheLI commented 3 years ago

@anssiko - Thanks! Maybe I could join the next bi-weekly teleconference for discussion and meet the community!

anssiko commented 3 years ago

@WenheLI, your proposal has been added to the 2020-12-10 agenda:

https://github.com/webmachinelearning/meetings/blob/master/telcons/2020-12-10-agenda.md#data-processing-proposal

Given the agenda is quite packed already, we will discuss this topic time allowing and if we run out of time we move it to the following call that due to the holiday period will take place 7 Jan 2021. Looking forward to the discussion.

anssiko commented 3 years ago

Given we ran out of time on our 2020-12-10 call, this proposal is now on the 2021-01-07 call agenda.

Edit: And that's me fixing the first typo of the year, as usual, the wrong year :-)

anssiko commented 3 years ago

@WenheLI we'd like to have you on the call to introduce the proposal to the group. Since we did not see you on today's call, I'd like to check whether the next call opportunity 21 January 2021 - 15:00-16:00 UTC+0 would work for you?

WenheLI commented 3 years ago

@WenheLI we'd like to have you on the call to introduce the proposal to the group. Since we did not see you on today's call, I'd like to check whether the next call opportunity 21 January 2021 - 15:00-16:00 UTC+0 would work for you?

@anssiko - Hi, that date definitely works for me! I am sorry that I did not join the last meeting since I was occupied by other work at that time.

anssiko commented 3 years ago

@WenheLI thanks for confirming, and no problem! I'll put discussion on this proposal to the work-in-progress 21 January 2021 call agenda.

gramalingam commented 3 years ago

@WenheLI Question: it looks like this is targeting training rather than inference, is that correct?

WenheLI commented 3 years ago

@WenheLI Question: it looks like this is targeting training rather than inference, is that correct?

@gramalingam IMO, this should be a generic data processing solution for the whole web-ecosystem not limited to training or inference specifically. And, from my experience, the data processing spec is equally important for both training and inference.

WenheLI commented 3 years ago

@WenheLI thanks for confirming, and no problem! I'll put discussion on this proposal to the work-in-progress 21 January 2021 call agenda.

@anssiko - I am so sorry. I was occupied and could not come to that meeting. Shall I present this proposal in the next coming meet?

anssiko commented 3 years ago

@WenheLI that’d be great! The idea is to allow for some meeting time for all submitted proposals to brainstorm together. We’re on a bi-weekly meeting cadence.