scribe-org / Scribe-Data

Wikidata, Wiktionary and Wikipedia language data extraction
GNU General Public License v3.0
23 stars 25 forks source link

Article remove from machine translation process with the help of sparql. #175

Closed axif0 closed 2 months ago

axif0 commented 2 months ago

Contributor checklist


Description

When i try to run Scribe-Data/src/scribe_data/language_data_extraction/English/translations$ python3.9 translate_words.py It gives me this error message,

Traceback (most recent call last):
  File "/media/asif/Scribe-Data/src/scribe_data/language_data_extraction/English/translations/translate_words.py", line 54, in <module>
    translate_to_other_languages(
  File "/media/asif/Scribe-Data/src/scribe_data/translation/translation_utils.py", line 105, in translate_to_other_languages
    translations[word] = {}
TypeError: list indices must be integers or slices, not str
(venv) asif@asif-X450LA:/media/asif/Mahbub11/Scribe-Data/src/scribe_data/language_data_extraction/English/translations$ 

Therefore, after realizing this, i modify the code to match scribe_data_json_export .

Qid Can be taken from Cli_utils, but sadly couldn't see a function. That's why I called it again.

image

I'm extremely sorry for not testing all the languages and not even fully completely test for one language, my potato old pc couldn't handled it. :(

I'm not sure about the Partitive Articles, In french articles, i see SPARQL query for French articles returns: tout, le, un, l', toute, toutes, tous, la, les, une

It's missing:

  1. Indefinite plural "des"
  2. Partitive articles "du," "de la," "de l'"

How can I modify the query to include these compound articles? Are they represented differently in Wikidata? Or am I in the right path?

Related issue

github-actions[bot] commented 2 months ago

Thank you for the pull request!

The Scribe team will do our best to address your contribution as soon as we can. The following is a checklist for maintainers to make sure this process goes as well as possible. Feel free to address the points below yourself in further commits if you realize that actions are needed :)

If you're not already a member of our public Matrix community, please consider joining! We'd suggest using Element as your Matrix client, and definitely join the General and Data rooms once you're in. Also consider joining our bi-weekly Saturday dev syncs. It'd be great to have you!

Maintainer checklist

axif0 commented 2 months ago

Hey, @andrewtavis hope you are well. Can you please check the pr if you have time?

andrewtavis commented 2 months ago

Hey @axif0 👋 Sorry for the wait here. I've been on and am on an extended vacation, but I'm now in the stage where I can work on some things :) Will check this out!

andrewtavis commented 2 months ago

How can I modify the query to include these compound articles? Are they represented differently in Wikidata? Or am I in the right path?

I think that all's fine, @axif0! Wikidata might just not have the appropriate relationships set yet, but it'll get there 😊 You can go in and edit them yourself, if you'd like to :) Here's the LID for des: https://www.wikidata.org/wiki/Lexeme:L9111.

If you want to find lexemes to edit on Wikidata, you can do a search with L: at the start of it like this search here.

andrewtavis commented 2 months ago

The one thing that I'm seeing here that's jumping out to me is that translations has been switched over to a list. I'll check it out a bit more, but this should be ok as I think that a list of dictionaries for each of the language translations also makes sense 😊