skoro / stardict

PHP interface to StarDict dictionaries
8 stars 1 forks source link

Uncaught RuntimeException: Unknown type "h", maybe it should be registered ? #11

Closed jzohrab closed 1 year ago

jzohrab commented 1 year ago

Hello, thank you for creating this project. If I can get it to work, I'd love to use it.

I downloaded a StarDict for Spanish from https://github.com/BoboTiG/ebook-reader-dict/releases/tag/es

I don't know what version it is ... I don't know StarDict at all, have just started looking at it today.

That stardict contains the following:

dict-data.dict.dz
dict-data.idx
dict-data.ifo
dict-data.syn

I used the example file from the readme, modified slightly:

<?php
use StarDict\StarDict;
require dirname(__FILE__) . '/vendor/autoload.php';

$dict = StarDict::createFromFiles('dict-data.ifo', 'dict-data.idx', 'dict-data.dict.dz');

foreach ($dict->get('gato') as $result) {
    echo $result->getValue();
}

and when I run it, I get the following error:

MacBook-Pro:__stardict_test jeff$ pwd
/Users/jeff/Downloads/__stardict_test
MacBook-Pro:__stardict_test jeff$ php main.php 
PHP Fatal error:  Uncaught RuntimeException: Unknown type "h", maybe it should be registered ? in /Users/jeff/Downloads/__stardict_test/vendor/skoro/stardict/src/DictData/TypeSequenceManager.php:42
Stack trace:
#0 /Users/jeff/Downloads/__stardict_test/vendor/skoro/stardict/src/StarDict.php(51): StarDict\DictData\TypeSequenceManager->getSequences('h')
#1 /Users/jeff/Downloads/__stardict_test/vendor/skoro/stardict/src/StarDict.php(115): StarDict\StarDict->__construct(Object(StarDict\Dict), Object(StarDict\Index\BinaryIndexHandler), Object(StarDict\DictData\FileDZDataReader), Object(StarDict\DictData\TypeSequenceManager))
#2 /Users/jeff/Downloads/__stardict_test/vendor/skoro/stardict/src/StarDict.php(98): StarDict\StarDict::create(Object(StarDict\DictFiles))
#3 /Users/jeff/Downloads/__stardict_test/main.php(7): StarDict\StarDict::createFromFiles('dict-data.ifo', 'dict-data.idx', 'dict-data.dict....')
#4 {main}
  thrown in /Users/jeff/Downloads/__stardict_test/vendor/skoro/stardict/src/DictData/TypeSequenceManager.php on line 42

Fatal error: Uncaught RuntimeException: Unknown type "h", maybe it should be registered ? in /Users/jeff/Downloads/__stardict_test/vendor/skoro/stardict/src/DictData/TypeSequenceManager.php:42
Stack trace:
#0 /Users/jeff/Downloads/__stardict_test/vendor/skoro/stardict/src/StarDict.php(51): StarDict\DictData\TypeSequenceManager->getSequences('h')
#1 /Users/jeff/Downloads/__stardict_test/vendor/skoro/stardict/src/StarDict.php(115): StarDict\StarDict->__construct(Object(StarDict\Dict), Object(StarDict\Index\BinaryIndexHandler), Object(StarDict\DictData\FileDZDataReader), Object(StarDict\DictData\TypeSequenceManager))
#2 /Users/jeff/Downloads/__stardict_test/vendor/skoro/stardict/src/StarDict.php(98): StarDict\StarDict::create(Object(StarDict\DictFiles))
#3 /Users/jeff/Downloads/__stardict_test/main.php(7): StarDict\StarDict::createFromFiles('dict-data.ifo', 'dict-data.idx', 'dict-data.dict....')
#4 {main}
  thrown in /Users/jeff/Downloads/__stardict_test/vendor/skoro/stardict/src/DictData/TypeSequenceManager.php on line 42

Any guidance would be appreciated. Thank you!

skoro commented 1 year ago

Hello, jzohrab! Dictionaries with sametypesequence=h is not yet supported. To force read the dict data as is try to set sametypesequence to m: sametypesequence=m which is mean pure text but in this case you have to convert output yourself.

skoro commented 1 year ago

Added in #12, update to 0.2 version and try this demo:

require_once __DIR__ . '/vendor/autoload.php';

$files = \StarDict\DictFiles::create('dict-data.ifo', 'dict-data.idx', 'dict-data.dict.dz', new \StarDict\Files\Factory);
$dict = \StarDict\StarDict::create($files, false);

$result = $dict->get('gato');
var_dump($result[0]->asText());
jzohrab commented 1 year ago

Súper thank you! I’ll give it a shot soon.

jzohrab commented 1 year ago

AWESOME! Thank you very much!

composer require skoro/stardict:0.2.0

and using your demo above, dumping the whole variable:

$ php main.php 
array(1) {
  [0]=>
  object(StarDict\DictData\Sequences\HtmlCodes)#17 (1) {
    ["value":"StarDict\DictData\Sequences\TypeSequence":private]=>
    string(2233) "[ˈɡa.to] 
<p>Del latín vulgar <i>cattus</i>, del imperial <i>catta</i>, de origen incierto. La hipótesis más probable lo deriva de alguna lengua afroasiática. Compárese el catalán <i>gat</i>, ...</ol></html>"
  }
}

fyi, I'm considering using this as an offline dictionary for my project Lute, which is for reading and learning foreign languages. May I use this project in my own project? Lute is free, uses Unlicense (ref https://github.com/jzohrab/lute/blob/master/UNLICENSE.md). Cheers! jz

Note: before I use this in Lute, I'll need to figure out how to handle things like verb declensions. Eg. with the example code, the word "tener" returns data, but "tengo" (the "I" form) throws:

$ php main.php tener
array(1) {
  [0]=>
  object(StarDict\DictData\Sequences\HtmlCodes)#17 (1) {
    ["value":"StarDict\DictData\Sequences\TypeSequence":private]=>
    string(420) "<p>Del castellano antiguo <i>tener</i> ("tener"), y este del latín <i>tenēre</i> ("sujetar").</p><ol><li>Poseer, ser dueño de algo.</li><li>Sostener.</li><li><i>Úsase para expresar una sensación</i>.</li><li><i>Úsase para medir la cantidad de tiempo de existencia de algo o alguien</i>.</li><li>(tener que) <i>Junto con la conjunción <b>que</b> indica la necesidad u obligación de hacer algo</i></li></ol></html>"
  }
}

MacBook-Pro:__stardict_test jeff$ php main.php tengo
PHP Warning:  Undefined array key "tengo" in /Users/jeff/Downloads/__stardict_test/vendor/skoro/stardict/src/StarDict.php on line 85

Warning: Undefined array key "tengo" in /Users/jeff/Downloads/__stardict_test/vendor/skoro/stardict/src/StarDict.php on line 85
PHP Fatal error:  Uncaught TypeError: StarDict\DictData\DataReader::fillSequences(): Argument #1 ($offset) must be of type StarDict\Index\DataOffsetItem, null given, called in /Users/jeff/Downloads/__stardict_test/vendor/skoro/stardict/src/StarDict.php on line 86 and defined in /Users/jeff/Downloads/__stardict_test/vendor/skoro/stardict/src/DictData/DataReader.php:17
Stack trace:
#0 /Users/jeff/Downloads/__stardict_test/vendor/skoro/stardict/src/StarDict.php(86): StarDict\DictData\DataReader->fillSequences(NULL, Array)
#1 /Users/jeff/Downloads/__stardict_test/main.php(10): StarDict\StarDict->get('tengo')
#2 {main}
  thrown in /Users/jeff/Downloads/__stardict_test/vendor/skoro/stardict/src/DictData/DataReader.php on line 17

Fatal error: Uncaught TypeError: StarDict\DictData\DataReader::fillSequences(): Argument #1 ($offset) must be of type StarDict\Index\DataOffsetItem, null given, called in /Users/jeff/Downloads/__stardict_test/vendor/skoro/stardict/src/StarDict.php on line 86 and defined in /Users/jeff/Downloads/__stardict_test/vendor/skoro/stardict/src/DictData/DataReader.php:17
Stack trace:
#0 /Users/jeff/Downloads/__stardict_test/vendor/skoro/stardict/src/StarDict.php(86): StarDict\DictData\DataReader->fillSequences(NULL, Array)
#1 /Users/jeff/Downloads/__stardict_test/main.php(10): StarDict\StarDict->get('tengo')
#2 {main}
  thrown in /Users/jeff/Downloads/__stardict_test/vendor/skoro/stardict/src/DictData/DataReader.php on line 17
skoro commented 1 year ago

Thanks for reporting the issue! Fixed in 0.2.1

jzohrab commented 1 year ago

Cheers and thank you very much. I think I'll need to look further into StarDicts that have information for declensions (e.g., "tener" => "tengo" for Spanish, "avoir" => "ai" for French, etc). In the meantime, with v0.2.1, my test program just returns empty:

MacBook-Pro:__stardict_test jeff$ php main.php tengo
array(0) {
}

but that's ok. Do you feel that this issue is closed, or is there something else I should check?

skoro commented 1 year ago

I close the issue, if you find something wrong with the package, please post another issue and I will take a look. Regarding to declensions, StarDict provides just only index with the words as is, this package doesn't do any transformations of the input and searches what you provide. So, I think there should be a special algorithm which firstly transforms the input to any possible declensions and then passes them to StarDict for further searching.

jzohrab commented 1 year ago

A belated thank you, @skoro. Cheers and best wishes! jz