Closed unho closed 5 years ago
Is this going to be fixed one day or is the project left on the sidewalk? cheers.
I've used this diff to tmdb.py locally as a workaround:
@@ -240,6 +284,9 @@ CREATE INDEX targets_%(slang)s_sid_lang_idx ON targets_%(slang)s (sid, lang);
%%(sid)s, %%(target)s, %%(target_lang)s)""" % slang
cursor.execute(query, unit)
+ def usable_units(self, units):
+ return filter(lambda u: max(len(u['source']), len(u['target'])) < 2712, units)
+
def get_all_sids(self, units, source_lang, project_style):
"""Ensures that all source strings are in the database+cache."""
all_sources = set(u['source'] for u in units)
@@ -348,6 +395,7 @@ CREATE INDEX targets_%(slang)s_sid_lang_idx ON targets_%(slang)s (sid, lang);
# store them
return 0
+ units = self.usable_units(units)
self.get_all_sids(units, source_lang, project_style)
try:
Postgres 10 fixed hash indexes, so maybe that is a better solution, if it provides all the functionality required for this index. The documentation mentions that only B-tree indexes can be used for unique indexes, so I might be wrong.
Amagama is still in production, so this might still get fixed.
I've just looked into this some more. It seems to bee a bit more subtle. Postgres compresses text values, so it can handle longer values in the unique index, as long a it can compress down small enough (the index also contains other columns, so I don't know exactly how small it should be). So while my patch is correct and avoids the error, it will filter out some values that could otherwise maybe be handled successfully.
I believe I fixed this reasonably well now. I tested the behaviour of the compression a bit, and I think the current code will attempt to import long strings up to the point that it should still work.
Note that I also updated the code to respect MAX_LENGTH during import. The old value of 1000 will often have a greater influence than the code fixing this problem. Anything beyond 1000 characters is really long and not quite in the domain of traditional translation memory. I increased the limit to 2000 in a follow-up commit anyway, just in case.
Got the following traceback when importing some translations: