keeleinstituut / tv-tolkevarav

Tõlkevärav (Translation Hub)
1 stars 0 forks source link

Can't open/unzip translated file #678

Open plakitkelly opened 6 months ago

plakitkelly commented 6 months ago

Laadi alla valmis tõlge from three dot menu File is broken

MariusJulius commented 6 months ago

@KaarelKa might it be similar as this issue: https://github.com/keeleinstituut/tv-tolkevarav/issues/634

KaarelKa commented 6 months ago

Probably same issue yeah, will check and fix

KaarelKa commented 6 months ago

@MariusJulius @thenouan It seems this doesn't work as well as I hoped. It does get some information, but we can't access Content-Disposition header on FE, so we will just get the file type, which is text/html in most cases. For a lot of these the file will work with a .zip extension as well, in which case the zip will show all the included files. with .txt it will just show one of them.

Basically if BE would allow us to access the Content-Disposition header, we can make sure we always use the correct extension

Access-Control-Expose-Headers: Content-Disposition

Alternative is that FE will assume the extension based on the endpoint, but this can only be implemented for endpoints that always return the file in the same format, which doesn't seem to be the case for most of the file download endpoints. If we get a lis of the ones that do, we can skip making Content-Disposition visible for the browser for those endpoints

KaarelKa commented 6 months ago

Added a minor FE improvements, to increase the likelyhood of the user being able to open the file for now

KaarelKa commented 6 months ago

FE side currently blocked, will need Access-Control-Expose-Headers: Content-Disposition for the response so we can determine the correct header on FE client side

MariusJulius commented 6 months ago
KaarelKa commented 6 months ago

Relevant downloads for this task: @MariusJulius @thenouan Maybe you can comment on these as well, if you know something

  1. Cat analysis download - (GET /cat-tool/download-volume-analysis/id ). Currently returns txt, but not sure if it can be something else as well. Have seen .zip some time in the past

  2. Download Xliff - (GET translation-order/api/cat-tool/download-xliff/id ). Currently we download as .txt on FE, but seems BE sent .xlf, not sure if this will always be .xlf though, I think this was one of the ones that could also be a .zip

  3. Download translated project - (GET /translation-order/api/cat-tool/download-translated/id). Currently FE downloads as .txt. If I remember correctly it can also be a .zip and I currently see that BE sends it as .bin at least for 1 project

  4. Audit logs (GET event-records/export) - exported as csv from FE, I don't think we can have any other file formats here

  5. Export project csv - (GET /projects/export-csv) - Will change this to .csv on FE, assuming it can't be anything else ?

  6. Export translation memory - (GET /tm/export/file/id) - exported as .tmx from FE, I don't think we can have any other file formats here

  7. Export users - (GET /institution-users/export-csv) - exported as csv from FE, I don't think we can have any other file formats here

"3 dot menu has options related to matecat. according to matecat documentation if there is more than 1 project file it gives zip" - Tried with the xliff download and translated project download, didn't seem to work. When I had 2 sub project files + 2 files in the "Lähtefailid tõlketööriistas"

-- "Laadi alla xliff" (2.) - BE still sent the .xlf file extension, not zip -- "Laadi alla valmis tõlge (3.) - BE still sent the .bin file extension, not zip

MariusJulius commented 6 months ago

Not 100% sure but: 1) should be TXT (even if splitted info is shown in one file) 2) should be XLF (even if splitted info is shown in one file) 3) should be same format which was added (e.g docx then docx: https://guides.matecat.com/finalising-a-project - downloaded file from OIU-2023-11--102 which was in txt, changed to docx and it opened.)

plakitkelly commented 6 months ago

I tried to reproduce downloading zip file, but I couldn't reproduce it. BUT created order with multiple source files and downloaded translated file As MJ said, it should download zip file for multiple files. It downloaded txt file with the content of a zip file image In right you see zip file, that I opened with notepad. And in left you can see just downloaded "tõlgitud fail" that came as txt. Content is so similar, so I think that system made a zip content, but system forces to download txt file

plakitkelly commented 6 months ago

And I opened that broken zip file with notepad, its content is the translated text. So, in 22.12, when I downloaded "broken" file, system gives zip file with txt content. That's why I couldn't open or unzip the zip file

KaarelKa commented 6 months ago

Yeah, you are correct, it arrives as .bin from BE, but client side app is not allowed to see the type of the file and has to make some decision what format to force it to. In this case the ".bin" file can work either as a zip or a txt file from what I've seen, so @thenouan even if you add the Access-Control-Expose-Headers: Content-Disposition header to the response, we still won't be able to tell in this specific case, whether we should force the .bin to zip or txt

KaarelKa commented 6 months ago

Actually I can try to add some optimistic workaround for these on FE.

  1. Will force to .txt
  2. Will force to .xlf (Though this one was .zip once upon a time)
  3. This one is harder to force, I'm not 100% sure what the logic is here. If I check the amount of source files, then the number might be higher than 1, but if only 1 of those was sent to the translation tool, I still get .txt from here, not zip. Basically from what I can see, I would need to know how many files were sent to the cat tool with POST translation-order/api/cat-tool/setup initially and I'm not sure if we can get this information from anywhere (+ not sure if this is good enough for deciding whether to export as .zip or .txt)
  4. Forced to .csv
  5. Forced to .csv
  6. Forced to .tmx
  7. Forced to .csv
KaarelKa commented 6 months ago
  1. Update: This also seems to be zip, when 3. is zip and xlf, when 3. is .txt
KaarelKa commented 6 months ago

Update

Okay it seems we get "cat_files" from BE for each subproject, which should be enough to do the check. Will do some testing, but hopefully it will work

KaarelKa commented 6 months ago

Okay it seemed to work, based on my testing at least

plakitkelly commented 6 months ago

@KaarelKa Did you solve 3rd? I translated docx file, and its content is as in screenshot image It's word document with notepad. If I tried to save this to docx, it doesn't work.. But I opened txt file in word, it works.

plakitkelly commented 6 months ago

So, it's not solution to download txt or zip. It should download with same format as it was uploaded

MariusJulius commented 6 months ago
  1. 1) It's zip if 1+ files submitted to project (e.g one is txt, one is doc) then when changing to zip and unpacking it opens properly 2) it returns single file in a format which was initially submitted e.g. if translation file was docx it returns docx (currently gives txt but when changing extension and opening it with word then it works).
KaarelKa commented 6 months ago

Yeah, got a workaround for this as well. Basically

  1. If there were multiple files we download the zip (files inside zip seem to be in the correct format)
  2. If there is only 1 file, we take the file name from the original file (cat_files)
plakitkelly commented 6 months ago
  1. Export translation memory - (GET /tm/export/file/id) - exported as .tmx from FE, I don't think we can have any other file formats here I exported tmx file from here But it seems that it's zip file actually. When I opened this "tmx" file with 7zip, I see many tmx files in there. These tmx files are OK. image image

I tried to find any tm, where it downloads only one tmx file, but I couldn't find. but with some tm it exports an empty zip (actually tmx, but opened with 7zip) image image

I don't understand what does it mean that it downloads many tmx files, maybe I imported many tmx files, but is it correct that one tm contains many tmx files?

plakitkelly commented 6 months ago

@plakitkelly Currently filtering the logs doesn't work, test again nr 4 other files are ok (except 6)

KaarelKa commented 6 months ago
  1. Turns out it will always be .zip, changed it
  2. Yeah, this seems to be a different issue, not related to exporting of logs - However just tested it and filtering seemed to work now
plakitkelly commented 6 months ago

Log - OK

  1. Is it deployed? It still downloads tmx
KaarelKa commented 6 months ago

Not deployed yet it seems

KaarelKa commented 6 months ago

@plakitkelly Deploy was just done now, can you test again ?

plakitkelly commented 6 months ago

Tested om 08.01 - 6 ok. All OK