signdict / website

A sign language dictionary
https://signdict.org
Mozilla Public License 2.0
82 stars 19 forks source link

Reverse word search #844

Open timonegk opened 4 years ago

timonegk commented 4 years ago

Hey there! I recently started to learn GSL and I love this project! However my classmates and I noticed that a reverse word search, i.e. translating a sign to german, would often come in handy.

My current idea to implement this would be similar to the ASL to English dictionary: All signs would be tagged with their characteristics regarding some given categories. The categories used at the ASL dictionary (handshape, movement, location, hands) probably make sense for GSL too. The characteristics in each category must of course be selected according to the occurence in the language.

As an example, the sign for stehen would be tagged with handshape: ["V", "B"], movement: "unidirectional", location: ["palm"], hands: "two-handed, alternatively". Handshape and location could contain multiple values.

What is your opinion on adding this feature? Do you think it would be useful or is it out of scope for the project? Do you think my thoughts make sense?

I could probably try to implement this, but since I am new to the project it would take some time and help from your part.

bitboxer commented 4 years ago

Yes, that would be one way to do this. But sadly I don't have the resources to tag all videos with this kind of metadata. And this is not easily automated.

There is also another possible approach: the sign writing below the video.

Screenshot 2020-06-03 at 23 06 44

Currently they are not 100% fitting to the video because I just ask the sign writing database for sign writings for this text. The team behind that service currently has someone mapping the videos to the correct sign writing image they store in their database. Since this is not only an image, but can also be extracted as metadata, it might be possible to use that data. But that process to assign videos is in an early exploration phase and I am not sure if they ever will manage to tag all 5.500 videos in the system.

bitboxer commented 4 years ago

Also yes, this would be really useful to have.

timonegk commented 4 years ago

I don't know if the sign writing is suitable for searching since it is rather complicated (and maybe sometimes multiple sign writings apply for the same sign? I am not very familiar with sign writing). The tagging of the videos is of course a problem. Maybe that could be outsourced to the users? Also it might be possible to extract some features from the sign writing and to reduce the manual labor to verifying the results. Of course no one expects you to tag thousands of videos, it could be a database that grows over time.

bitboxer commented 4 years ago

Sign writing should be a 1:1 mapping, but the automated process I have in place right now is not capable of finding the correct one, because of that it just shows everything it finds 😥 .

Ideally the team that does the matching of sign writing to the videos will start soon and I will have a couple of hundred items already. Then we can extract exactly the data you mention. But I am not sure if this will ever really take of. They are just trying things out at the moment. If they really do it, we can extract everything automatically, and with more detail than you mentioned.

On the other hand I sadly have seen nearly none interaction of users. Most users are learning the language and are too scared to do things on the site besides watching things. Which I totally understand. So outsourcing classification to users might not be worth the effort. Maybe if we create an MVP with a google form or something really simple to test the participation rate, we will have real data before someone invests a huge amount of time creating the editor for that information? What do you think?

timonegk commented 4 years ago

To test the user interaction, an external MVP as you suggested would probably make sense instead of integrating the editor, even though I would not expect too much – I recently learned about the 1% rule which I think applies here. Integration in the website might increase participation, but I don't know.

Or, instead of tagging videos directly or waiting for the sign writing-video-matching, we completely rely on extracting the current signwriting data to use in the search. The advantage would be low implementation effort, but this would result in search results where the video does not match the search.

An eventual exact matching to the video would then also benefit the search – and manually tagging videos would probably help the sign writing matching.

I don't really know which solution is better. Since I am currently excited about Phoenix/Elixir, I would not mind the implementation effort, but I might underestimate how much work is actually necessary.

Also do you have a documentation of the delegs API? I unfortunately was not able to find any documentation of the API and I would like to try it.

bitboxer commented 4 years ago

If you would love to play around with an input system to classify videos, go ahead. I would also love to see how people react to this and use it. Maybe starting with 4-5 parameters that users could classify? And then we go ahead and modify it if we see people using it.

Sadly the delegs API is private and not documented. I am in direct contact with the developers there and they give me what I need to implement the sign writing display stuff :) . I know they have more structured data about everything, but with their current work load I am not sure if and how we could extract it from them without too much trouble on their side.

timonegk commented 4 years ago

I just outlined some views of the reverse search and parameter entry as I currently imagine them in GIMP. Please take a look and tell me what you like or don't like.

The search would look like this: search

The entries would contain basically the same field below the video where users could enter the sign data. Of course this is only shown when the data does not exist yet. When it exists, the form will be omitted, but it might be nice to show the information somewhere. entry

An information badge tells the user why their help is needed: entry-help

Hovering on the hand form descriptions yields images of the possible hand shapes. entry-hover

The entered data could either be verified by an administrator or – since it is very easy to verify – by other users. Also we have to decide if only logged in users can add the parameters. I would assume that most users of the website do not sign up since there are not many advantages. Therefore participation would presumably be much larger when no login is required, but it might result in problems with spam/vandalism.

timonegk commented 4 years ago

Oh, both hands/one hand should probably be a mutually exclusive choice ^^

timonegk commented 4 years ago

And the hand forms in the images are of course only examples, there are still many forms missing and I don't really know which forms can be grouped together well. But this is not really important at the moment.

bitboxer commented 4 years ago

Sounds like a nice starting point for this. Would love to see how people react to this.