xiaoyifang / goldendict-ng

The Next Generation GoldenDict
https://xiaoyifang.github.io/goldendict-ng/
Other
1.54k stars 83 forks source link

The non long headword size increase #1129

Closed GD-fix closed 11 months ago

GD-fix commented 11 months ago

Is your feature request related to a problem? Please describe. Is it possible to increase the non long for indexing headword size? There is a huge number of 1xx size headwords (at least in ZIM dictionaries) that are ignored during indexing (launch GD-ng in console and see the output).

Describe the solution you'd like To increase the non long for indexing headword size up to 200.

Thanks in advance.

xiaoyifang commented 11 months ago

can you give some example of the long headwords. If you compile from source ,I think you can change the size limit manually . https://github.com/xiaoyifang/goldendict-ng/blob/06c0135875a998bd33ac34bd13ec1564435ff40e/src/btreeidx.hh#L269

Attention: the longer the headword ,the slower

GD-fix commented 11 months ago

Of course: the bigger number of dictionary headwords, the bigger number of time for it indexing. But this is not a reason to cut down dictionaries (after all, the program does not work with articles with ignored headwords), am I wrong?

I don't compile from source, I'm using AppImage releases.

I think, there is a big number of such dictionaries here. From example, during indexing of this small one the following lines are displayed in the output: Skipped too long headword: "3D Printing - Cylinder printin" size: 101 Skipped too long headword: "Any defacto standards for cont" size: 114 Skipped too long headword: "Anycubic Kossel Plus raises th" size: 121 Skipped too long headword: "Apply / find / create a stainl" size: 107 Skipped too long headword: "Are there any common household" size: 130 Skipped too long headword: "Are there any methods of limit" size: 104 Skipped too long headword: "Are these the right types of e" size: 109 Skipped too long headword: "At which point does a delta 3d" size: 102 Skipped too long headword: "BLTouch not probing all points" size: 134 Skipped too long headword: "Calibration of Y-axis produces" size: 116 Skipped too long headword: "Can I build a 3D image of a wa" size: 115 Skipped too long headword: "Can I design or remix a model " size: 110 Skipped too long headword: "Can a sustainable material lik" size: 106 Skipped too long headword: "Can anyone suggest what techno" size: 107 Skipped too long headword: "Can higher quality prints be a" size: 137 Skipped too long headword: "Can leaving the nozzle at 160 " size: 102 Skipped too long headword: "Cura Filament Change: Extruder" size: 102 Skipped too long headword: "Curious case of Ender 3 Pro ex" size: 122 Skipped too long headword: "Custom Marlin update for Ender" size: 111 Skipped too long headword: "Delta printer nozzle not movin" size: 105 Skipped too long headword: "Do I need linear rails on a co" size: 101 Skipped too long headword: "Do printer controllers take in" size: 101 Skipped too long headword: "Do the TW, THW and THHN or THW" size: 113 Skipped too long headword: "Does CuraEngine get some advan" size: 109 Skipped too long headword: "Does anyone know how Slic3r de" size: 105 Skipped too long headword: "Does shimming my build surface" size: 108 Skipped too long headword: "Dual-colour prints on Creator " size: 110 Skipped too long headword: "Extruder skips steps, when fil" size: 106 Skipped too long headword: "Flow Settings in Cura 2.4 for " size: 102 Skipped too long headword: "Hotend functioning inconsisten" size: 109 Skipped too long headword: "How can a speaker or active bu" size: 145 Skipped too long headword: "How can you calibrate extrusio" size: 103 Skipped too long headword: "How do I repeat the layers of " size: 125 Skipped too long headword: "How do people load filament, p" size: 133 Skipped too long headword: "How do the Mechanical Properti" size: 103 Skipped too long headword: "How to calculate the Vref and " size: 117 Skipped too long headword: "How to calculate the extruder " size: 114 Skipped too long headword: "How to quantify the dividing l" size: 105 Skipped too long headword: "How to send G-Code directly fr" size: 102 Skipped too long headword: "How to wire for AC mains volta" size: 103 Skipped too long headword: "I can set the voltage and curr" size: 123 Skipped too long headword: "I found a filament that can be" size: 118 Skipped too long headword: "I have blind clips that are no" size: 132 Skipped too long headword: "I was updating my firmware for" size: 143 Skipped too long headword: "I'm new to 3D Printing and I'v" size: 131 Skipped too long headword: "In Creality slicer, which sett" size: 115 Skipped too long headword: "In the standard PC Cable Wire " size: 147 Skipped too long headword: "Is a heated bed an essential c" size: 112 Skipped too long headword: "Is is true that it is quicker " size: 117 Skipped too long headword: "Is it possible to change the o" size: 101 Skipped too long headword: "Is it possible to export a mod" size: 128 Skipped too long headword: "Is it practical to build a sep" size: 117 Skipped too long headword: "Is there a lubricant that can " size: 106 Skipped too long headword: "Is there a method or S/W avail" size: 149 Skipped too long headword: "Is there a possibility to add " size: 107 Skipped too long headword: "Is there any (relatively simpl" size: 135 Skipped too long headword: "Linking an Arduino Mega with R" size: 122 Skipped too long headword: "M503 command is reporting \"Max" size: 133 Skipped too long headword: "Marlin, fans weird behaviours:" size: 135 Skipped too long headword: "Monoprice Select Mini V2 extru" size: 117 Skipped too long headword: "My FlashForge Creator Pro has " size: 130 Skipped too long headword: "My endstops have 4 female plug" size: 112 Skipped too long headword: "My first attempt at pausing a " size: 116 Skipped too long headword: "PET-G under-extrusion after ch" size: 103 Skipped too long headword: "Perplexing Y-axis shifting pro" size: 110 Skipped too long headword: "Print is rotated perfectly on " size: 122 Skipped too long headword: "Printer froze mid print, stepp" size: 102 Skipped too long headword: "Prusa i3 MK3S keeps clogging d" size: 121 Skipped too long headword: "Resin LCD print not printing f" size: 104 Skipped too long headword: "Resin printers: What effect do" size: 135 Skipped too long headword: "Setting up UBL for the first t" size: 106 Skipped too long headword: "Should I Opt For Linear Rails " size: 104 Skipped too long headword: "Should I comment out the code " size: 101 Skipped too long headword: "Sporadic thermal runaway E1 er" size: 114 Skipped too long headword: "Suggestions for heat element a" size: 102 Skipped too long headword: "The Y-axis on my XVICO X3 seem" size: 109 Skipped too long headword: "Under what circumstances are 3" size: 127 Skipped too long headword: "Using both .gcode and .gbr fil" size: 116 Skipped too long headword: "Water does not flow through 4 " size: 141 Skipped too long headword: "What Setting In Cura Determine" size: 116 Skipped too long headword: "What are the advantages and di" size: 105 Skipped too long headword: "What are the maximum recommend" size: 107 Skipped too long headword: "What are the pros and cons of " size: 104 Skipped too long headword: "What are the variables for PID" size: 113 Skipped too long headword: "What are viable substitutes fo" size: 101 Skipped too long headword: "What clearance should I leave " size: 122 Skipped too long headword: "What does it mean when they sa" size: 102 Skipped too long headword: "What effects does the non-cart" size: 120 Skipped too long headword: "What foundation designs are ad" size: 103 Skipped too long headword: "What is the acceptable resista" size: 103 Skipped too long headword: "What is the difference between" size: 103 Skipped too long headword: "What is the distance between t" size: 101 Skipped too long headword: "What is the functional differe" size: 101 Skipped too long headword: "What is the optimum shape to e" size: 127 Skipped too long headword: "What is this called and how do" size: 104 Skipped too long headword: "When I attempt to calibrate ex" size: 102 Skipped too long headword: "When I print with a raft on my" size: 110 Skipped too long headword: "When building a RAMPS 1.4 base" size: 117 Skipped too long headword: "When building the ramps 1.4 is" size: 114 Skipped too long headword: "When printing multiple objects" size: 120 Skipped too long headword: "Where to find \"Heat deflection" size: 130 Skipped too long headword: "Where/how can I connect a phys" size: 130 Skipped too long headword: "Which common 3D printing mater" size: 128 Skipped too long headword: "Which material \"creeps\" (plast" size: 111 Skipped too long headword: "Why are Islands sometimes dete" size: 110 Skipped too long headword: "Why does the first layer only " size: 109 Skipped too long headword: "Why is it more common to move " size: 102 Skipped too long headword: "Why is my nozzle routinely clo" size: 146 Skipped too long headword: "Why on the Ender 3 does the ho" size: 107 Skipped too long headword: "Will I still be able to export" size: 131 Skipped too long headword: "Will my 3D printer significant" size: 107 Skipped too long headword: "Would it be possible to use an" size: 114 Skipped too long headword: "Y-axis limit switch adjustment" size: 106 Building a tree of 224 elements

xiaoyifang commented 11 months ago

zim format is not made for dictionary. from the example above , I think we should call them sentences. For zim format ,though the headwords ignored ,but you can still search them through full text search.

Maybe a workaround , if sentences are too long to be headword , only take the first 100 letters as headword.

GD-fix commented 11 months ago

If it will find articles in NON full text search with first 100 letters of headwords matching the search criteria, it will be more usefull, than finding only in full text search. But non long headword size increasing to 200 (I didn't notice any headword with size of 2xx or bigger) solves the problems of very huge output during indexing and possibility of not finding the desired article. Besides the headwords search indexing is much shorter than full text search one...