sillsdev / serval

A REST API for natural language processing services
MIT License
4 stars 0 forks source link

Missing and reordered verses in Serval drafts #397

Closed bhartmoore closed 4 months ago

bhartmoore commented 5 months ago

We've had multiple reports of missing or re-ordered verses in Serval drafts over the past few days. Its possible this should be two issues but I'm initially combining as all the reports have come in since Friday.

These chapter-final verses are missing from 2 projects, but were present in the source: Hyolmo: Exo 40:38; Esther 10:3; Jonah 4:11 SCP_U: Esther 10:3; Jonah 4:11

Peter reports here investigating missing verses for a PNG project.

The image below shared by James Cuénod shows wrongly-ordered verses in the kyu project. (This issue was also reported in SILNLP drafts for the CTB project this week; possibly unrelated since they were not run through Serval.) image (4)

Enkidu93 commented 5 months ago

@bhartmoore @pmachapman For those that were drafted through SF/Serval, would it be possible to have the engine and build ids? I'd like to make sure we have a complete list of those experiencing issues so I can verify that any solution we come up with solves all existing issues. Also, if you could send me any projects (and the file path to the experiments themselves if possible) for which silnlp had the same issue. I may be wrong, but I believe silnlp is using alignment code from machine.py which mirrors the code in Machine (C#) exactly, so it could very well be a single problem across both. Thank you!

bhartmoore commented 5 months ago

Thanks, @Enkidu93. I found two Hyolmo (scp) builds on the day Mike B reported finding the error. Both projects he reported it for were Hyolmo, but one is their "AI" project, so these most likely represent the two problemmatic builds. I am not sure how we'd access the drafts to verify missing verses. 'engine_id': '65a23e93ea0c57adad126c41' 'build_id': '664f08e1d04bc69c89cadcab' and 'engine_id': '6608bfeba20eb39853a7b37a' 'build_id': '664f09b7d04bc69c89cae0a7'

bhartmoore commented 5 months ago

@Enkidu93 The silnlp draft that was pointed out was GEN c. 49 in this folder: "S:\MT\experiments\FT-Bod\NLLB.1.3B.bod_NTB-bo_CTB\infer\8000\NTB_2024"

The experiment can be found on the same path in "S:\MT\experiments\FT-Bod\NLLB.1.3B.bod_NTB-bo_CTB"

Screenshot of the chapter in Notepad++: image

Enkidu93 commented 5 months ago

Thank you! This is very, very helpful!

johnml1135 commented 5 months ago

Note that this issue should be resolved with https://github.com/sillsdev/machine/pull/204 with release 1.4.5. It is on QA right now and will be on production hopefully by the end of the week.

ddaspit commented 5 months ago

Have we tested these projects to see if they are fixed?

Enkidu93 commented 5 months ago

I'm not sure that this will have fixed all the issues we're seeing - particularly in silnlp - this didn't involve changes to machine.py (at least yet), did it? Do those just need to be ported over?

Enkidu93 commented 5 months ago

Just to make sure this is recorded somewhere: There's also a bug where introductory material is not being translated. @bhartmoore, is there an example you could send my way?

bhartmoore commented 5 months ago

@Enkidu93 I believe this should affect anything translated since the most recent Serval update went live in SF. An example would be the Pentateuch translated from NIV11 to French in project SFDF. Here is the handy new "Admin" view of the most recent build for that project from Scripture Forge, but let me know if you need more/other information.

Diagnostic Information
Build Id: 6657975aec58cd36956de963
Corpora Ids: 6657975aec58cd36956de962
Date Finished: 2024-05-30T03:35:17.134+00:00
Message: Completed
Percent Completed: 1
Revision: 2263
Queue Depth: 0
State: COMPLETED
Step: 20000
Translation Engine Id: 66579759ec58cd36956de95f
johnml1135 commented 5 months ago

@bhartmoore - the most recent fixes should be in Serval Live as of Wed. night. Can this project be re-run to see if they are still getting the same results?

bhartmoore commented 5 months ago

@johnml1135 Great! I'm guessing you mean the SF builds and not the silnlp one, correct? I'll need to ask Mike Bacon if he can re-request drafts for this team. His were missing chapter-final verses. We'd also want to check with Peter Chapman or James Cuénod to see if they can re-run the SF projects that saw wrongly-ordered verses.

ddaspit commented 5 months ago

Yes, these changes only affect SF/Serval. We still need to replace the USFM parser in silnlp.

bhartmoore commented 5 months ago

User Mike Bacon reports that final verses are still missing from Serval-via-SF drafts for the scp_U project generated on 6/16.

I will open a separate issue for the Intro and first section heading to be included in drafts.

Image

bhartmoore commented 5 months ago

Just saw issue #408 opened by Pchapman which sounds very much like what Mike Bacon is facing.

Enkidu93 commented 4 months ago

@johnml1135 @ddaspit Back to work. Where does this stand? Have some of these issues been addressed elsewhere? Did you, @johnml1135 , verify that any were fixed by the previous changes?

ddaspit commented 4 months ago

We have fixed the missing last verse issue in #408. Once we finish #405, SF should be able to fix the rest of this issue.

Enkidu93 commented 4 months ago

We have fixed the missing last verse issue in #408. Once we finish #405, SF should be able to fix the rest of this issue.

Excellent! Thank you.

Enkidu93 commented 4 months ago

@johnml1135 Should this be closed since the only remaining work is on the SF side?

johnml1135 commented 4 months ago

Yes, I believe this is fully resolved.