Closed funderburkjim closed 7 years ago
Jan 25, 2017. Extra AP90 corrections
L = 235, aGa -> aG type=p, missing virama
L = 539, atigraha -> atigrah, type=t
L = 613, ativrahmacaryaM -> atibrahmacaryaM
L = 655, ati -> ati-lomaSa or ati-romaSa
L = 749, atf -> attf, type=t
L = 1341, anApta -> anAptf, type=p,cf. ap90
L = 2472, aprarizwiH -> aparizwiH
L = 2629, apaTAsaH-> apahrAsaH
==================
Jan 25, 2017. Extra AP corrections
L=280, aNka -> aNk, type=p, missing virama
L=428, ajahalliNgama -> ajahalliNgam , type=p, missing virama
L=584, atikAnta -> atikrAnta, type=t (also others)
L = 587, atikamaRam -> atikramaRam, type=t (also others)
L = 588, atikamaRIya -> atikramaRIya
L = 589, atikudDa -> atikrudDa
L = 590, atikUra -> atikrUra
L = 816, atyahita -> atyayita, type=p
L = 822, atala -> atula, type=p, missing vowel diacritic
L = 1048, aDikam -> aDikram
L = 1436, anAkanda-> anAkranda
L = 1437, anAkAnta -> anAkrAnta
L = 1681, anukIH -> anukrIH
L = 1693, anukakaca -> anukrakaca
L = 1696, anukam -> anukram
L = 2416, anyAddakz -> anyAdfkz
L = 2474, anvArohaRama -> anvArohaRam, type=p, missing virama
L = 2502, apakalaNkakaH -> apakalaNkaH , type=p? cf mw, ap90
L = 2530, apakoSaH -> apakroSaH
L = 2715, aparikama -> aparikrama
L = 2906, apahnAsaH -> apahrAsaH
?2?hotvan NO-AP90 hotvan
means there is no hotvan in AP90, right?
In a browser, open two copies of the list display, one for AP and one for AP90
We can't have the yellow box here in page scans to ease word search, right?
?1?hotrIya,hotriya hotrIya hotriya
hotriya:AP,CAE,CCS,GRA,MD,MW,PUI,PW,PWG hotrIya:AP90,BEN,CAE,MW,MW72,PW,PWG,SHS,VCP,WIL,YAT hotrIyaM:SKD
Belonging to an oblation
same meaning. GRA has it from Rigveda. Both valid as per PW,PWG, but for Apte I would go for only one, because meanings identical.
no hotvan in AP90, right?
Yes.
We can't have the yellow box here in page ...
I'm not sure what 'yellow box' means. But, it has something to do with developing a UI.
I'm ambiguous on this UI development. If it takes me 8-10 hours to develop a helpful UI, is it worth doing? Will there be enough other participation to make the development time and effort cost effective? Another point of view might be that whatever the immediate benefit of UI development in a particular case, I should do it because otherwise there will definitely be almost no participation by others.
This is a quandary.
UI development in a particular case, I should do it because otherwise there will definitely be almost no participation by others.
Yes, that's obvious. And with @SergeA for sure. But in this case I hardly understand why the old code, the UI already developed can't be modified. It's too different, right?
that's obvious
I guess this is not yet obvious to me. I'm so used to using ad-hoc methods. Perhaps I need to change my mindset to one of almost always thinking of UI as an essential component of problem solution. The benefit is that UI enables contribution and engagement by others, and this engagement has numerous unexpected benefits.
I terms of an appropriate UI for this case...
The difference here, it seems to me, is that there are several entries that need to be examined together to understand the situation. Take the first '?' example.
?1?aGa,aGana aGa aGana
The Python comparison process generated this example, but you can't understand what is going on by looking just at this line. You have to see, in this case, the two prior lines:
aG NO-AP90 aG
aGa aGa aGa
?1?aGa,aGana aGa aGana
Now, even before looking at dictionaries, it seems clear that 'aG' is a verb in AP. Why is it not in AP90? Then, we see that 'aGa' (presumably some adjective) occurs in both AP90 and AP. Then, we see, on the third line, that there is a 2nd 'aGa' in AP90, which is paired with word 'aGana' in AP.
So now, we can speculate that maybe that first aGa in AP90 (the one in 2nd line) maybe really should be an 'aG' -- we know that sometimes virAmas are missed, either in digitization or print.
So now we are ready to look at dictionary entries. We need to look at 'aGa' in AP90 and see if the first one really should be spelled 'aG' - and we find this to be so (the print is missing a virAma). Then, we can double check that this corrected 'aG' in ap90 corresponds in sense to the already present 'aG' in AP. It does.
So we generate the correction
L = 235, aGa -> aG type=p, missing virama
Anyway, that's the process that seems to be relevant. And the other cases so far examined by me are somewhat similar.
But I'm still not sure what the UI should look like for a 'case'.
I guess this is not yet obvious to me. I'm so used to using ad-hoc methods. Perhaps I need to change my mindset to one of almost always thinking of UI as an essential component of problem solution. The benefit is that UI enables contribution and engagement by others, and this engagement has numerous unexpected benefits.
I mean in most cases where I can live without UI, @SergeA can't. He was waiting a few years since I convinced you to try the first UI and now you see, after @Shalu411 is gone and @drdhaval2785 frozen, that UI makes a difference. But in this AP vs. AP90, PWG vs. PW, MW vs. MW72 - and UI made for one will work for at least 2 different pairs - the most important dictionaries and hundreds of misspelled headwords.
You have to see, in this case, the two prior lines:
Sure, but an HTML with clickable links would make more sense, than copy-pasting hundreds of time.
we know that sometimes virAmas are missed, either in digitization or print.
So we can generate a sublist of possible cases where the only difference might have been a dropped of virama?
But I'm still not sure what the UI should look like for a 'case'.
Even if it would be just a list of relevant entries from sanhw1, like
hotriya:AP,CAE,CCS,GRA,MD,MW,PUI,PW,PWG hotrI:AP,AP90,MW,MW72,SKD hotrIya:AP90,BEN,CAE,MW,MW72,PW,PWG,SHS,VCP,WIL,YAT hotrIyaM:SKD
and words with links (an HTML page is a primitive UI as well, a GUI), instead of pure txt, that would quicken and there would be no need to have 2 windows open initially, at least I would not use them
hEmavatI hEmavatI hEmavatI ?1?hEyaNgavInam,hEyaNgavam hEyaMgavInaM hEyaMgavam ?2?hEraRyavAsas hEraRyavAsas NO-AP ?2a?hEraRya NO-AP90 hEraRya ?3?hEraRyaka NO-AP90 hEraRyakaH hErika hErikaH hErikaH
Hi @funderburkjim, I am back from my slumber. Will be willing to work on some pending issue. Can you give me some coding work of useful nature, so that I can contribute. I would not get too much of time to jump into correction submission as of now, but tools I can create.
Based on my experience with my paper in Normalizing headwords, there is one tip specifically for AP90.
AP90 has tendency to use M instead of m at end. If before comparision M is converted to m, some false positives may be weeded out.
I didnt go through the file, but Marcis' comment has one example. There may be other similar cases. I guess in present case, we accounted for N-M comparision, but not for terminal M-m comparision.
?1?hEyaNgavInam,hEyaNgavam hEyaMgavInaM hEyaMgavam
hEyaNgavInam
Good idea. I'll look into that.
Regarding 'tools' - I'm not exactly sure what this covers. I consider the displays, esp. the apidev displays to be tools. It might be useful to have a display based on the hwnorm1c data. This would require (a) building a database (sqlite file) and (b) a search suggestion function (php) for this database that would take into account spelling normalization. This much would probably be a fairly self-contained task, which could then be the front end of a Cologne display for multidictionary lookup.
If this sounds interesting to you, I'll think in more detail what ingredients (and prototypes) you might need to construct this.
16 cases. Status: DONE @SergeA deva-UI
@gasyoun 's comments got me to thinking more about how to leverage the existing UI (that used in #332, #334) in the remaining AP/AP90 cases.
There are some subsets (filters) of the remaining cases that can be examined in the previous way.
As a start, this batch deals with the cases where the AP spelling is the same as the AP90 spelling, but with an extra 'a' at the end . There are 16 cases and I suspect that most of them are cases where there is a missing virAma in the AP spelling.
There are also a few other filters that might be analyzable similarly.
There are 20 of these. Status: done by @SergeA (29 Jan 2017)
31 cases. done by @SergeA (29 Jan 2017)
In #334, we noticed many digitization spelling errors where one of the ligatures for 'kr' in the AP dictionary had been misinterpreted as 'k'.
There are still some of these remaining that were not caught in #334. Perhaps the present list of 31 will get all (or most) such cases.
23 cases. Status DONE @funderburkjim 01/30/2017.
NOTE: many of these are hard to decide.
In these cases, the merging of the headwords of AP90 and AP
Thus, there is probable cause to think that the AP spelling might be wrong.
36 cases. Status DONE @funderburkjim 01/30/2017
NOTE: Almost all of these were actually AP90 errors.
In these cases, the merging of the headwords of AP90 and AP
Thus, there is some possibility that one or the other dictionaries has a spelling error - no a-priori evidence to favor either one. But the pairing suggests that we should look at these cases.
5 cases. Status DONE @funderburkjim 01/30/2017.
In these cases, the merging of the headwords of AP90 and AP
Thus, there is probable cause to think that the AP90 spelling might be wrong.
The 6 batches of above are ready.
If you tackle any of them, just make a note in the comments (maybe change status from TODO to DONE and user name.
If any remain to be done on Monday, I'll do them then.
Also, if anyone has ideas of other specific filters that might be programmed, do mention.
@funderburkjim very interesting. Experimenting with non-sandhi headwords
Rd
zawKaRda:SCH KaRda:IEG caRdeSvarapperuvilE:IEG amAvAsyaSARdilyAyana:VEI aBizekamaRdapa:IEG aRdika:IEG
nq
kUpadanqa:MW OCR error danqa:PE OCR error
St ST
sw sW
0
Rn nR
aRnimittaka:PD aRnirUpita:PD zaRnavatiSrAdDanirRaya:ACC zaRnavatiSrAdDaprayoga:ACC sanRI:STC suvaRnakadalI:SKD
RY YR
aYRit:PD is false positive, because a grammar term (a-ñṇit) (Gr.) (pratyaya) other than those marked by the indicatory letters ñ and ṇ
nj - 14 results, but all seem non-changable
ganjwar:IEG
Ys
anaYsa:PD anaYsamAsa:PD anaYsamAsagrahaRa:PD anaYsamAsatva:PD naYsamAsa:ACC,MW naYsUtrArTavAda:ACC,MW
The situation with k/kr in AP1957 is very sad. Many and many words in the examples are misspelled. Please, count the number of "kr" in ap90 and ap1957. The difference will give the approximate number of erroneous cases. I think on the basis of ap90 it is possible to make a "kr"-word list and then search those words with misspelled "k" in ap1957. But if there will be too many such cases it makes sense to do a special UI with supposed "kr"-correction and two buttons - accept or reject.
Under Batch 5 #6 udGfzwam (AP) v. udDfzwam(AP90) -
AP is right (it is G not D).
But both dictionaries have numerous headwords near this where a 'udG' is misspelled as 'udD' [The misspelling can in part be confirmed by alphabetical ordering.]
For AP:
udgrIva ok
udDaH -> udGaH
udDanaH -> udGanaH
udGAtin ok
udGaw ok
udDawitam -> udGanaH
udDAwaH -> udGAwaH
udDAwakaH wrong
udDAwana wrong
udDAwita wrong
udDawwakaH wrong
udDawwanam wrong
udDawwita wrong
udDasam wrong
udDAtaH wrong
udGAtin wrong
udDuz wrong
udGuzwa ok
udDozaH wrong
udGfz ok
udDarzaRam wrong
udGfzwam ok
udDoRa wrong
uddaMSaH ok --- now we're into udd...
For AP90:
udgrIva
udDaH wrong
udDanaH wrong
udGAtin ok
udGaw ok
udDawitaM wrong
udDAwaH wrong
udDAwakaH wrong
udDAwana wrong
udDAwita wrong
udDawwakaH wrong
udDawwanaM wrong
udDawwita wrong
udDasaM wrong
udDAtaH wrong
udDuz wrong
udGuzwa ok
udDozaH wrong
udDfz wrong
udDarzaRaM wrong
udDfzwaM wrong
uddaMSaH ok now we're into udd words.
Batches 4-6 finished.
Ready to begin install process.
@SergeA noticed , in regard to spalling of arTApaya in ap90 and arTApay in AP:
there is no unique standard for the nominal verb bases :( AP90 gives like MW AP1957 gives like PWG
This kind of correspondence can find help refine the correspondences of hwnorm1.
The hard part may be to find filters 'like' these. We have an existing filter for the MW cases
in this MWvlex file.
Namely, search for <vlex>Nom.</vlex>
.
This should provide a good starting point for finding most nominal verbs in other dictionaries.
One reason for apparently duplicate cases.
For instance SuBa -> SuB. The spelling 'SuBa' is seen on more than one line of the entry.
In most versions of the program that generates cases, this situation causes a different case for each line containing the string of letters ('SuBa'). In a few versions of the program, where we're focused on headwords, only the first line (the one with the headword) generates a case.
Corrections re batches 1-8 now installed.
status: DONE 02/02/2017 @funderburkjim
26 cases
~~In these cases, the AP90 spelling ends in 'a', and the AP spelling is the same, but with an ending anusvara. Examination of a few cases leads to the suspicion that the AP anusvara is in error, as the entry is an adjective.~~
As Dhaval points out below, the description of this batch is wrong.
It is the AP90 spelling which has the ending 'M': AP90 = AP+'M'
I still suspect that mostly AP is wrong, and that the reason will typically be that AP entry is NOT
an adjective (the text is not marked as a.
).
status: Batch 10 done, 02/02/2017 @SergeA and @funderburkjim Batch 11 done, 02/02/2017 @funderburkjim 26 cases
batch 10: 31 cases: slp1 UI and Deva UI
batch 11: 23 cases: slp1 UI and Deva UI
These are some randomly chosen cases that I suspect are AP spelling errors. They include the [thankfully small] number of additional 'k/kr' errors in AP headwords, a few 'J/jY' errors, and various others that caught my eye.
There are several 'duplicate' cases -- I thought these were removed, but apparently not :( . Luckily, its easy in UI to call the duplicates no change and move to next case.
However, a few apparent 'duplicate' cases occur because there are two entries in AP with the same headword spelling in our digitization.. It's possible that one of these spellings is right and the other one wrong.
Batch 9. likely AP errors
In these cases, the AP90 spelling ends in 'a', and the AP spelling is the same, but with an ending anusvara. Examination of a few cases leads to the suspicion that the AP anusvara is in error, as the entry is an adjective.
Examination reveals that the logic was not properly translated into code. The output has AP90 having M at end and AP not having M at end. This seems to be default behaviour.
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/sanskrit-lexicon/CORRECTIONS/issues/335#issuecomment-276783020, or mute the thread https://github.com/notifications/unsubscribe-auth/AFfQ_KrcnfJtG24Pe1U6hEG1osLdxTk8ks5rYPSAgaJpZM4LuSPm .
On 2 Feb 2017 02:39, "funderburkjim" notifications@github.com wrote:
Batch 9. likely AP errors
status: TODO
26 cases
slp1 UI http://www.sanskrit-lexicon.uni-koeln.de/scans/APScan/2014/pywork/correctionwork/issue-335b/205/update.php and Deva UI http://www.sanskrit-lexicon.uni-koeln.de/scans/APScan/2014/pywork/correctionwork/issue-335b/205/update.php?input=deva
In these cases, the AP90 spelling ends in 'a', and the AP spelling is the same, but with an ending anusvara. Examination of a few cases leads to the suspicion that the AP anusvara is in error, as the entry is an adjective.
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/sanskrit-lexicon/CORRECTIONS/issues/335#issuecomment-276783020, or mute the thread https://github.com/notifications/unsubscribe-auth/AFfQ_KrcnfJtG24Pe1U6hEG1osLdxTk8ks5rYPSAgaJpZM4LuSPm .
logic was not properly translated into code.
Right, there is discrepancy between code and description.
Note revision to description in comment above.
@drdhaval2785 Did you notice my reply above to your 'work on tools' request?
Thought you might not have seen it since no reply from you. There are other possibilities besides the hwnorm1 idea.
Yes I did. I didnt know how much I will be able to contribute. So was thinking how to draft an answer. For me, tools mean non-HTML stuff. I would not be able to contribute towards UI of any sort.
@drdhaval2785
There are some aspects of this which definitely would involve UI, but some parts would not.
The parts that would not involve UI might include:
review the logic of the current hwnorm1c construction (i.e., the rules for normalizing spelling). I made these up some time, but it would be good to have someone else review the ideas.
Improve the normalization by taking into account some dictionary-specific headword spelling conventions. For instance, SKD almost always uses nom. sg. form for nouns, so 'pitA' for 'pitf', for instance. Clearly SKD's pitA should be considered the same as pitf in other dictionaries. But how to do this to avoid false positives? I'm not sure.
There's also the question of how to properly associate roots in the different dictionaries. For instance, our digitization of WIL has 'gama' for the root, but there is also a m. noun 'gama' in WIL. We should associate the WIL verb entry 'gama' with the usual 'gam' of other dictionaries, but associate the m. noun 'gama' of WIL with the usual 'gama' of other dictionaries. How to do this?
There are many other such relations among headword spellings of specific dictionaries. Not all the relations have to be resolved at once.
The UI part has interesting aspects, such as how to present a multi-dictionary display. Or how to have a suite of multi-dictionary displays? But all these questions will only be as good as the underlying headword correspondences present in hwnorm1c.txt (or some later enhanced version)
We have made the requisite data files for supporting links to the names of literary sources for PW.
I think we have the requisite information needed to do the similar for PWG. Once this infrastructure is available, then I can do the relatively easy part of enhancing the displays of PWG to add links.
On 2 Feb 2017 9:15 a.m., "funderburkjim" notifications@github.com wrote:
@drdhaval2785 https://github.com/drdhaval2785 the hwnorm1 idea
There are some aspects of this which definitely would involve UI, but some parts would not. The parts that would not involve UI might include:
pwg literary sources
We have made the requisite data files for supporting links to the names of literary sources for PW.
I think we have the requisite information needed to do the similar for PWG. Once this infrastructure is available, then I can do the relatively easy part of enhancing the displays of PWG to add links.
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/sanskrit-lexicon/CORRECTIONS/issues/335#issuecomment-276860645, or mute the thread https://github.com/notifications/unsubscribe-auth/AFfQ_H1cJxJTTKusyZOknYZbJgcTKhdIks5rYVFwgaJpZM4LuSPm .
On 2 Feb 2017 9:15 a.m., "funderburkjim" notifications@github.com wrote:
@drdhaval2785 https://github.com/drdhaval2785 the hwnorm1 idea
There are some aspects of this which definitely would involve UI, but some parts would not. The parts that would not involve UI might include:
Will do so.
https://github.com/sanskrit-lexicon/hwnorm1/blob/master/normalization.pdf may help in this regards. It covers all 33 dictionaries on points mentioned. Jim, what I would like to hear from your side is - what other places do dictionaries differ in conventions. You note the places, I will do comprehensive research. I intend to make version 2 of this paper comprehensive. This will take care of dictionary specific tweaks.
The only way I see to do so would be to separate the entries in different dictionaries on basis of meaning and not on headwords. Then only lexical and semantic similarities may be tagged properly.
pwg literary sources
We have made the requisite data files for supporting links to the names of literary sources for PW.
I think we have the requisite information needed to do the similar for PWG. Once this infrastructure is available, then I can do the relatively easy part of enhancing the displays of PWG to add links.
This seems interesting. So we go through the whole process which we did for PW or do some smart work? I guess many would be common in PW and PWG. Let us make final PW literary source list as our starting list of literary sources of PWG. Whatever is missing or new can be altered accordingly.
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/sanskrit-lexicon/CORRECTIONS/issues/335#issuecomment-276860645, or mute the thread https://github.com/notifications/unsubscribe-auth/AFfQ_H1cJxJTTKusyZOknYZbJgcTKhdIks5rYVFwgaJpZM4LuSPm .
This is another idea prompted primarily by @drdhaval2785 's inquiry.
@fxru did the first step in gathering current information about the DTDs of the various dictionaries. But I haven't had time to build on this work. However, I think an important deficiency of the current state of the Cologne digitizations is lack of uniformity in certain details:
{|...|}
to indicate
'wide spacing'. This occurs in several dictionaries and seems to serve no useful purpose. Obsolete
markup needs to be identified and removed. (There is also obsolete markup in MW, e.g. the <c>
element, and perhaps some others.<lang n="Greek">
) which should be used to replace all the various
ad-hoc markup. This also applies to MW.This work of providing uniformity among the dictionaries will be of immense value to further work on the dictionaries, both by us and by others in the future. It will allow, even more than now, the development of tools to parse the dictionaries for various purposes of analysis, and will provide a foundation for the enhancement of the dictionaries by adding markup.
@drdhaval2785 I'll start by reviewing the status of the PWG literary source data. Will post in a separate issue under PWG.
suspicion that the AP anusvara is in error, as the entry is an adjective.
Well done.
SKD's pitA should be considered the same as pitf in other dictionaries.
Yeah, and I guess it has been left undocumented by Dhaval, or am I wrong?
associate roots in the different dictionaries
I would go listwise. Firs we extract all the known lists of roots from each dictionary. I have some concordances, let's sit down together and see how to automate. Too many manual operations are involved with dhatus.
I think we have the requisite information needed to do the similar for PWG. Once this infrastructure is available, then I can do the relatively easy part of enhancing the displays of PWG to add links.
Yeah, the reasearch is almost over (for now) and could be implemented as it is.
I intend to make version 2 of this paper comprehensive. This will take care of dictionary specific tweaks.
Now that I call a good morning to start my day with.
But I haven't had time to build on this work.
Did he abondon it or finished?
I think we need to do the same thing at an earlier stage of the dictionary process, so that the digitizations themselves (not just the xml derivate of the digitization), has a uniform structure.
Should we realy care much about it? Sure it would be good, but is it of priority and practical value?
Batches 9-11 are ready for installation.
@SergeA Thanks!
The corrections of batches 9-11 have now been installed.
After rerunning the program that merges AP and AP90 headwords, there are now about 700 that are marked with question marks. A quick examination suggests that there are probably still quite a few misspellings. Since this issue is getting rather extended, I'm closing it and opening another to handle further cases.
The revised ap90_ap_hw2_short.txt has been revised.
I'll develop UIs for more corrections next week.
@SergeA There are about 230 of the [?] cases that match 2 words, one from AP and one from AP90. These cases seem like the most fertile ground for examination.
I was thinking about jdoing all these in 10 batches or so, with corresponding batches and cases, one for AP and one for AP90. This is because there is no obvious way to guess whether the error for a given case is in the AP spelling or in the AP90 spelling. Then, procedure would be to open up and work at the same time on two batches: batch1-AP and batch1-AP90, batch2-AP and batch2-AP90 , etc.
Does this sound like a reasonably efficient approach ?
@SergeA Here is a crude computation relating to the 'k/kr' issue .
'kr' occurs in 2801 lines (out of 267898 lines) in AP.txt (1.05%)
'kr' occurs in 2167 lines (out of 199968 lines) in AP90.txt (1.08%) in AP90.
The lower percentage might be taken as (crude) evidence that there are 0.03% k/kr errors remaining in AP.txt, or about 80 lines. If this computation is not completely bogus, that's a fairly small number, so the worst of this problem with AP.txt is behind us.
0.03% k/kr errors remaining in AP.txt, or about 80 lines
Adorable stats. Yeah, the approach is what we can only dream of.
Then, procedure would be to open up and work at the same time on two batches: batch1-AP and batch1-AP90, batch2-AP and batch2-AP90 , etc. Does this sound like a reasonably efficient approach ?
Maybe yes. But I see here one problem - both batches will use the same scan tab. And we need to keep open both scans simultaneously. Is it possible to separate them as scan_tab_1 and scan_tab_2?
there are 0.03% k/kr errors remaining in AP.txt, or about 80 lines
Is it right to count lines and not occurrences? Looks too good to be true. I was afraid there were thousands of them.
We've corrected several hundred headwords spellings in AP and AP90 in issues #332 and #334.
The comparison program has been rerun and I've started going through the cases where the program marks the comparison with a question mark. There are about 1000 of these.
At the moment, here's my procedure for comparing.
In a browser, open two copies of the list display, one for AP and one for AP90
open the ap90_ap_hw2_short.txt file in a text editor.
open a scratch text file to keep results
Go through the ap90_ap_hw2_short file, searching for '?'.
When you've done a session, post the error notes to a comment in this issue.
So far, I've worked through line 2930 (?1?apaTAsa...) and have come up with 31 corrections.
The next comment has the first batch of corrections. Anyone who helps can use this simple format. Sticking to this format will allow me to write a simple program to parse the corrections and autogenerate most of the change transactions. Note, I'm keeping the AP corrections separate from the AP90.