sign / translate

Effortless Real-Time Sign Language Translation
https://sign.mt
Other
470 stars 80 forks source link

Korean to KSL #129

Closed AmitMY closed 1 month ago

AmitMY commented 8 months ago

Hello,

I have the similar question.

I want to translate from Korean Nature Language to Korean Sign Language (KSL) or vice versa.

But I found there's no translation to KSL. Is it because there was limited dataset for it?

Also, can I know how to train Korean and KSL dataset or how did you train other available languages?

Thanks!

Originally posted by @tmdtmdqorekf in https://github.com/sign/translate/issues/128#issuecomment-1895754659

AmitMY commented 8 months ago

Hi @tmdtmdqorekf If you know of a dictionary with Korean SIgn Language, and they give us permission to use their data, I can include it.

In general, this is not a trained model. At the moment, it uses dictionary entries and stitches them together: https://github.com/ZurichNLP/spoken-to-signed-translation

tmdtmdqorekf commented 8 months ago

Thanks for your reply!

This is a KSL dictionary below. But the site and all the words are written in Korean.

Url: https://sldict.korean.go.kr/front/main/main.do

The site policy said that we can use the data if we mention author(source) we can use it.

(Just in case you need, the source is 'National Institute of the Korean Language (NIKL)')

AmitMY commented 8 months ago

Materials on this website subject to the 'Creative Commons Attribution-Attribution-NonCommercial-NoDerivatives 2.0 Korea License' can be freely used for non-profit purposes. However, in order to use the copyrighted work, the following conditions must be observed.

This means we are not allowed to perform and show pose estimations

tmdtmdqorekf commented 8 months ago

Oh is it because of the second condition?

AmitMY commented 8 months ago

Yes. If they relax that condition, it would be possible. Ideally CC-BY, but CC-BY-NC would also be ok.

tmdtmdqorekf commented 8 months ago

I'll ask about it.

Instead, I found other dataset which is usable.

Url: https://aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=realm&dataSetSn=103

Can you check about this please?

tmdtmdqorekf commented 8 months ago

But in this case, only Koreans can request API.

If I give you this API after requesting it by myself, is it not possible?

Or should I edit your code to implement it on my own?

AmitMY commented 8 months ago

This is a very large dataset - 2.63 TB - so I am not sure it contains videos of single words. Does it?

Also, can you direct me to where I can see the license of the data?

tmdtmdqorekf commented 8 months ago

This is more information about the data below. So, yes, it contains a video of binding word.

Deployment content and amount of data delivered

Total 536,000 sign language video clips (.mp4 files)

tmdtmdqorekf commented 8 months ago

Also, for the license, here's the link. You can check it in the first section.

https://www.aihub.or.kr/intrcn/guid/usagepolicy.do?currMenu=151&topMenu=105

I included the translated content.

Data Introduction

AI learning data provided by the AI hub (hereinafter referred to as "AI Data") was established as part of the "Building Infrastructure for the Intelligent Information Industry" project by the Ministry of Science and ICT and the Korea Intelligent Information Society Agency.

All rights to data, AI application models and data authoring tools, various manuals, etc. (hereinafter referred to as "AI Data, etc."), which are tangible and intangible results of this project, are held by AI data and participating organizations (hereinafter referred to as "executing organizations, etc.") and the Korea Intelligent Information Society Agency.

This AI data has been established for the development of artificial intelligence technology, products, and services, and can be used for commercial and non-profit research and development purposes in various fields such as intelligent products, services, and chatbots.

Data Utilization Policy

In order to use this AI data, etc., we notify you that we agree to the following and comply with it.

When using this AI data, etc., it must be revealed that it is the result of the project of the Korea Intelligence and Information Society Promotion Agency, and the same must be revealed in the secondary work using this AI data, etc.

In order for a corporation, organization, or individual located abroad to use AI data, it is necessary to agree separately with the executive agency and the Korea Intelligence Information Society Promotion Agency.

In order to take this AI data out of the country, an agreement is required separately from the executive agency and the Korea Intelligence Information Society Promotion Agency.

This AI data can only be used for learning artificial intelligence learning models.

If the purpose, method, and content of using AI data are deemed illegal or inappropriate, the Korea Intelligence and Information Society Agency may refuse to provide it, and if it has already provided it, it may request the suspension of use, the return, and disposal of AI data.

The AI data, etc. provided shall not be provided, transferred, rented, or sold to any other corporation, organization, or individual who has not been approved by the Korea Intelligence and Information Society Agency.

All civil and criminal responsibilities for AI data, etc. arising as a result of unauthorized access, provision, transfer, rental, sales, etc. other than the purpose under paragraph (4), shall lie with the corporation, organization, or individual using AI data, etc.

If it is found that personal information, etc. is included in the AI hub-provided dataset, the user shall immediately report the fact to the AI hub and delete the downloaded dataset.

The non-identification information (including reproduction information) provided by the AI hub shall be safely used for the purpose of developing artificial intelligence services, etc., and no act shall be performed to re-identify an individual using it.

In the future, if the Korea Intelligence and Information Society Promotion Agency conducts a fact-finding survey on use cases and achievements, it shall be faithfully engaged in this.

Thanks.

tmdtmdqorekf commented 8 months ago

Data downloader can't provide the dataset to the third party, so you need to request directly to the agency if you want to use the dataset outside Korea.

So.. I think it's really difficult for you to access to KSL dataset.

I hope this process can be eased in the future.

BTW thanks for your quick feedback!

AmitMY commented 8 months ago

Well, if you have access to their data, you can use it in https://github.com/ZurichNLP/spoken-to-signed-translation I will try to request access at some point, but it is not my main priority right now.