unit tests failing - Githubissues

paul-shannon / slexil

Software Linking Elan Xml to Illuminated Language

MIT License

0 stars 1 forks source link

unit tests failing #37

Closed paul-shannon closed 5 years ago

paul-shannon commented 5 years ago

@davidjamesbeck Hi David, I did a fresh pull just now, and all of the unit tests seem to fail!

cd tests
make -i
python3 test_MorphemeGloss.py
--- test_constructor
--- test_parse
Traceback (most recent call last):
  File "test_MorphemeGloss.py", line 348, in <module>
    runTests()
  File "test_MorphemeGloss.py", line 31, in runTests
    test_parse()
  File "test_MorphemeGloss.py", line 57, in test_parse
    assert(mg.getParts() == ['hab', '=', '3A', '=', 'mouth', '•', 'cry'])
AssertionError
make: [morphemeGloss] Error 1 (ignored)

Many more failures follow. What shall we do?

davidjamesbeck commented 5 years ago

Hi, Paul

I was in the process of going back through the unit tests to fix them up to be consistent with the changes that needed to be made to accommodate continuous playback and the new logging feature that collects non-fatal errors and puts them in a file for the user. The most crucial changes (the ones that are going to cause most of the tests to fail) have to do with the input to the Text class, which now a) requires a startStopTable listing the begining/end times for all the audio phrases, and b) requires the project directory from web app.py (so it knows where to put the error log file).

As for testing MorphemeGloss, the problem is that we have some methods to clean up the grammaticalTerms file, add “1”, “2”, “3”, and do a few other formatting things. It doesn’t make sense to run them for every instance of MorphemeGloss since it creates a single master list for the text, so the methods are now called in text.py. So if we run a test on morphemeGloss.py now, it’s not going to run those methods (below, for instance, I think that the method that converts abbreviations in all caps to lower case hasn’t applied, so “HAB” and “hab” aren’t being matched by the assertion).

I was working yesterday on making some of these tests pass again, at least the ones in the testTextPy directory. I wasn’t sure if it was worth fixing up all the others (especially the morpheme glossing tests, a lot of which pass constructed data to methods we’ve refined quite a bit since), but I can if you think it is a good idea. I’m heading off to Mexico to do fieldwork for a couple of weeks, but I can put it on my list of things to do when I get back. I’ve also started trying slexil out on some new texts and the results have been good (see Aymara in testTextPyData and the test for it in test_TextPy.py).

Cheers,

David

On Jun 7, 2019, at 8:14 AM, Paul Shannon notifications@github.com wrote:

@davidjamesbeck https://github.com/davidjamesbeck Hi David, I did a fresh pull just now, and all of the unit tests seem to fail!

cd tests make -i python3 test_MorphemeGloss.py --- test_constructor --- test_parse Traceback (most recent call last): File "test_MorphemeGloss.py", line 348, in runTests() File "test_MorphemeGloss.py", line 31, in runTests test_parse() File "test_MorphemeGloss.py", line 57, in test_parse assert(mg.getParts() == ['hab', '=', '3A', '=', 'mouth', '•', 'cry']) AssertionError make: [morphemeGloss] Error 1 (ignored) Many more failures follow. What shall we do?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/paul-shannon/slexil/issues/37?email_source=notifications&email_token=AKN4HBQ652EF234NVIOPAFDPZJUM7A5CNFSM4HVXO3T2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GYINHFA, or mute the thread https://github.com/notifications/unsubscribe-auth/AKN4HBXJREXDDBG5KRTPBWTPZJUM7ANCNFSM4HVXO3TQ.

paul-shannon commented 5 years ago

Hi David,

I will sound like a fuss-budget, a martinet, a scold! Running that risk…

In test-driven development, all the tests need to pass all the time. Working on a shared code base is nigh impossible if we don’t stick to that. The rule of thumb should be: don’t commit code until all the tests pass.

Can we adopt that practice for the future?

Field work in Mexico! Very fine!

Paul

On Jun 7, 2019, at 7:51 AM, David Beck notifications@github.com wrote:

Hi, Paul

I was in the process of going back through the unit tests to fix them up to be consistent with the changes that needed to be made to accommodate continuous playback and the new logging feature that collects non-fatal errors and puts them in a file for the user. The most crucial changes (the ones that are going to cause most of the tests to fail) have to do with the input to the Text class, which now a) requires a startStopTable listing the begining/end times for all the audio phrases, and b) requires the project directory from web app.py (so it knows where to put the error log file).

As for testing MorphemeGloss, the problem is that we have some methods to clean up the grammaticalTerms file, add “1”, “2”, “3”, and do a few other formatting things. It doesn’t make sense to run them for every instance of MorphemeGloss since it creates a single master list for the text, so the methods are now called in text.py. So if we run a test on morphemeGloss.py now, it’s not going to run those methods (below, for instance, I think that the method that converts abbreviations in all caps to lower case hasn’t applied, so “HAB” and “hab” aren’t being matched by the assertion).

I was working yesterday on making some of these tests pass again, at least the ones in the testTextPy directory. I wasn’t sure if it was worth fixing up all the others (especially the morpheme glossing tests, a lot of which pass constructed data to methods we’ve refined quite a bit since), but I can if you think it is a good idea. I’m heading off to Mexico to do fieldwork for a couple of weeks, but I can put it on my list of things to do when I get back. I’ve also started trying slexil out on some new texts and the results have been good (see Aymara in testTextPyData and the test for it in test_TextPy.py).

Cheers,

David

On Jun 7, 2019, at 8:14 AM, Paul Shannon notifications@github.com wrote:

@davidjamesbeck https://github.com/davidjamesbeck Hi David, I did a fresh pull just now, and all of the unit tests seem to fail!

cd tests make -i python3 test_MorphemeGloss.py --- test_constructor --- test_parse Traceback (most recent call last): File "test_MorphemeGloss.py", line 348, in runTests() File "test_MorphemeGloss.py", line 31, in runTests test_parse() File "test_MorphemeGloss.py", line 57, in test_parse assert(mg.getParts() == ['hab', '=', '3A', '=', 'mouth', '•', 'cry']) AssertionError make: [morphemeGloss] Error 1 (ignored) Many more failures follow. What shall we do?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/paul-shannon/slexil/issues/37?email_source=notifications&email_token=AKN4HBQ652EF234NVIOPAFDPZJUM7A5CNFSM4HVXO3T2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GYINHFA, or mute the thread https://github.com/notifications/unsubscribe-auth/AKN4HBXJREXDDBG5KRTPBWTPZJUM7ANCNFSM4HVXO3TQ.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

davidjamesbeck commented 5 years ago

Okay, no problem. I’ll continue with the testTextPy ones, then work backwards. I’m not sure how to fix the MorphemeGloss tests—I guess I could add the methods that now live in text.py into the test file and run them there?

David

On Jun 7, 2019, at 8:56 AM, Paul Shannon notifications@github.com wrote:

Hi David,

I will sound like a fuss-budget, a martinet, a scold! Running that risk…

In test-driven development, all the tests need to pass all the time. Working on a shared code base is nigh impossible if we don’t stick to that. The rule of thumb should be: don’t commit code until all the tests pass.

Can we adopt that practice for the future?

Field work in Mexico! Very fine!

Paul

On Jun 7, 2019, at 7:51 AM, David Beck <notifications@github.com mailto:notifications@github.com> wrote:

Hi, Paul

I was in the process of going back through the unit tests to fix them up to be consistent with the changes that needed to be made to accommodate continuous playback and the new logging feature that collects non-fatal errors and puts them in a file for the user. The most crucial changes (the ones that are going to cause most of the tests to fail) have to do with the input to the Text class, which now a) requires a startStopTable listing the begining/end times for all the audio phrases, and b) requires the project directory from web app.py (so it knows where to put the error log file).

As for testing MorphemeGloss, the problem is that we have some methods to clean up the grammaticalTerms file, add “1”, “2”, “3”, and do a few other formatting things. It doesn’t make sense to run them for every instance of MorphemeGloss since it creates a single master list for the text, so the methods are now called in text.py. So if we run a test on morphemeGloss.py now, it’s not going to run those methods (below, for instance, I think that the method that converts abbreviations in all caps to lower case hasn’t applied, so “HAB” and “hab” aren’t being matched by the assertion).

I was working yesterday on making some of these tests pass again, at least the ones in the testTextPy directory. I wasn’t sure if it was worth fixing up all the others (especially the morpheme glossing tests, a lot of which pass constructed data to methods we’ve refined quite a bit since), but I can if you think it is a good idea. I’m heading off to Mexico to do fieldwork for a couple of weeks, but I can put it on my list of things to do when I get back. I’ve also started trying slexil out on some new texts and the results have been good (see Aymara in testTextPyData and the test for it in test_TextPy.py).

Cheers,

David

On Jun 7, 2019, at 8:14 AM, Paul Shannon <notifications@github.com mailto:notifications@github.com> wrote:

@davidjamesbeck <https://github.com/davidjamesbeck https://github.com/davidjamesbeck> Hi David, I did a fresh pull just now, and all of the unit tests seem to fail!

cd tests make -i python3 test_MorphemeGloss.py --- test_constructor --- test_parse Traceback (most recent call last): File "test_MorphemeGloss.py", line 348, in runTests() File "test_MorphemeGloss.py", line 31, in runTests test_parse() File "test_MorphemeGloss.py", line 57, in test_parse assert(mg.getParts() == ['hab', '=', '3A', '=', 'mouth', '•', 'cry']) AssertionError make: [morphemeGloss] Error 1 (ignored) Many more failures follow. What shall we do?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <https://github.com/paul-shannon/slexil/issues/37?email_source=notifications&email_token=AKN4HBQ652EF234NVIOPAFDPZJUM7A5CNFSM4HVXO3T2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GYINHFA https://github.com/paul-shannon/slexil/issues/37?email_source=notifications&email_token=AKN4HBQ652EF234NVIOPAFDPZJUM7A5CNFSM4HVXO3T2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GYINHFA>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AKN4HBXJREXDDBG5KRTPBWTPZJUM7ANCNFSM4HVXO3TQ https://github.com/notifications/unsubscribe-auth/AKN4HBXJREXDDBG5KRTPBWTPZJUM7ANCNFSM4HVXO3TQ>.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/paul-shannon/slexil/issues/37?email_source=notifications&email_token=AKN4HBW7ZCZDZKRWPAMADD3PZJZK3A5CNFSM4HVXO3T2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXGCGYI#issuecomment-499917665, or mute the thread https://github.com/notifications/unsubscribe-auth/AKN4HBWCMKTTKQLNBEEBUQLPZJZK3ANCNFSM4HVXO3TQ.

paul-shannon commented 5 years ago

Hi David,

I hope your field work in Mexico went well!

I am happily engaged in getting all of the Harry Moses Daylight story into EAF. I am using your version in "Two syəyəhub from Harry Moses”.

I have a barebones “toEAF.py” which reads a yaml file, creates the xml, and validates it against http://www.mpi.nl/tools/elan/EAFv3.0.xsd. I am eager to try out the result in slexil. Will you let me know when you get a chance to fix the unit tests, so that I will know how to call the current version?

I noticed that you now require a startStopTable as a parameter to the Text class constructor. I am not sure that is really needed: the eaf file is the first argument to the constructor. Code to extract the start and stop times can be found here:

slexil/audioExtractor.py:determineStartAndEndTimes

Using this, you could save the user the trouble of separately building that table. Which would be convenient for me!

Here’s a sample of the yaml format I am currently using. It makes it really easy for me to create the eaf xml file: easier for me than ELAN, and easier than hand-building valid xml. The tier names can probably be improved. I need to add a yaml field for the audio file (and/or the extracted audio phrases). This may never be useful for anyone else but works well for me.

Let me know if you need help with getting all the unit tests working again.

Paul

On Jun 7, 2019, at 8:03 AM, David Beck notifications@github.com wrote:

Okay, no problem. I’ll continue with the testTextPy ones, then work backwards. I’m not sure how to fix the MorphemeGloss tests—I guess I could add the methods that now live in text.py into the test file and run them there?

David

On Jun 7, 2019, at 8:56 AM, Paul Shannon notifications@github.com wrote:

Hi David,

I will sound like a fuss-budget, a martinet, a scold! Running that risk…

In test-driven development, all the tests need to pass all the time. Working on a shared code base is nigh impossible if we don’t stick to that. The rule of thumb should be: don’t commit code until all the tests pass.

Can we adopt that practice for the future?

Field work in Mexico! Very fine!

Paul

On Jun 7, 2019, at 7:51 AM, David Beck <notifications@github.com mailto:notifications@github.com> wrote:

Hi, Paul

I was in the process of going back through the unit tests to fix them up to be consistent with the changes that needed to be made to accommodate continuous playback and the new logging feature that collects non-fatal errors and puts them in a file for the user. The most crucial changes (the ones that are going to cause most of the tests to fail) have to do with the input to the Text class, which now a) requires a startStopTable listing the begining/end times for all the audio phrases, and b) requires the project directory from web app.py (so it knows where to put the error log file).

As for testing MorphemeGloss, the problem is that we have some methods to clean up the grammaticalTerms file, add “1”, “2”, “3”, and do a few other formatting things. It doesn’t make sense to run them for every instance of MorphemeGloss since it creates a single master list for the text, so the methods are now called in text.py. So if we run a test on morphemeGloss.py now, it’s not going to run those methods (below, for instance, I think that the method that converts abbreviations in all caps to lower case hasn’t applied, so “HAB” and “hab” aren’t being matched by the assertion).

I was working yesterday on making some of these tests pass again, at least the ones in the testTextPy directory. I wasn’t sure if it was worth fixing up all the others (especially the morpheme glossing tests, a lot of which pass constructed data to methods we’ve refined quite a bit since), but I can if you think it is a good idea. I’m heading off to Mexico to do fieldwork for a couple of weeks, but I can put it on my list of things to do when I get back. I’ve also started trying slexil out on some new texts and the results have been good (see Aymara in testTextPyData and the test for it in test_TextPy.py).

Cheers,

David

On Jun 7, 2019, at 8:14 AM, Paul Shannon <notifications@github.com mailto:notifications@github.com> wrote:

@davidjamesbeck <https://github.com/davidjamesbeck https://github.com/davidjamesbeck> Hi David, I did a fresh pull just now, and all of the unit tests seem to fail!

cd tests make -i python3 test_MorphemeGloss.py --- test_constructor --- test_parse Traceback (most recent call last): File "test_MorphemeGloss.py", line 348, in runTests() File "test_MorphemeGloss.py", line 31, in runTests test_parse() File "test_MorphemeGloss.py", line 57, in test_parse assert(mg.getParts() == ['hab', '=', '3A', '=', 'mouth', '•', 'cry']) AssertionError make: [morphemeGloss] Error 1 (ignored) Many more failures follow. What shall we do?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <https://github.com/paul-shannon/slexil/issues/37?email_source=notifications&email_token=AKN4HBQ652EF234NVIOPAFDPZJUM7A5CNFSM4HVXO3T2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GYINHFA https://github.com/paul-shannon/slexil/issues/37?email_source=notifications&email_token=AKN4HBQ652EF234NVIOPAFDPZJUM7A5CNFSM4HVXO3T2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GYINHFA>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AKN4HBXJREXDDBG5KRTPBWTPZJUM7ANCNFSM4HVXO3TQ https://github.com/notifications/unsubscribe-auth/AKN4HBXJREXDDBG5KRTPBWTPZJUM7ANCNFSM4HVXO3TQ>.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/paul-shannon/slexil/issues/37?email_source=notifications&email_token=AKN4HBW7ZCZDZKRWPAMADD3PZJZK3A5CNFSM4HVXO3T2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXGCGYI#issuecomment-499917665, or mute the thread https://github.com/notifications/unsubscribe-auth/AKN4HBWCMKTTKQLNBEEBUQLPZJZK3ANCNFSM4HVXO3TQ.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

paul-shannon commented 5 years ago

@davidjamesbeck Hi David,

I'll be grateful if you could sync up tests/test_AudioExtractor.py and audioExtractor.py.
The latter includes this recent change in the determineStartAndEndTimes method:

#     audioIDs = [x.attrib["ANNOTATION_ID"] for x in audioTiers]
       audioIDs = list(range(1, len(audioTiers)+1))

The test expects audio tier ids to be from the eaf file.

I wonder: isn't that association still important? Perhaps you needed an ordinal index into the startAndStopTimes table. Wouldn't the row number of the table implicitly provide that?

(I know I will seem like a broken record, stuck in a vinyl groove endlessly and annoyingly repeating the same plea. But if you will indulge me, may I try again to persuade you that unit tests are central to the software development process? Not a follow-on step, but an indispensable part of the moment-by-moment work of writing code?)

davidjamesbeck commented 5 years ago

Hi, Paul

I made that change because the audio tier IDs don’t necessary come in sequential order in the file, so the row number doesn’t always match (e.g., the seventh line in sequence in one of the texts I was testing was “a46” and there was no “a7”). There may be a way to add an ordinal index over and above the tier ID to the Pandas dataframe, but I still haven’t gotten the hang of Pandas so I didn’t know how to do that. As far as I can see, the actual audio tier ID isn’t used anywhere, though we could certainly preserve that association by either 1) adding an ordinal index to the tbl dataframe (which I’d need you to do, or tell me how to do) and tweaking the way Text reads audioExtractor.startStopTable, or 2) by creating a separate dataframe and assigning that to audioExtractor.startStopTable. Otherwise, I can do as you asked and fix the tests in tests/test_AudioExtractor.py (or, rather, prioritize fixing those tests). Whatever you think makes the most sense.

Sorry about the tests/test_AudioExtractor.py not being kept up to date. I hadn’t really noticed it there and as I mentioned before I hadn’t actually run the makefile after working out the changes to text.py because I knew it would just throw up a lot of errors. I was in the process of cleaning those up but ran out of time and had to go off to the field. This should probably all have been on its own branch in git, in retrospect.

David

On Jun 25, 2019, at 6:20 AM, Paul Shannon notifications@github.com wrote:

@davidjamesbeck https://github.com/davidjamesbeck Hi David,

I'll be grateful if you could sync up tests/test_AudioExtractor.py and audioExtractor.py. The latter includes this recent change in the determineStartAndEndTimes method:

audioIDs = [x.attrib["ANNOTATION_ID"] for x in audioTiers]
   audioIDs = list(range(1, len(audioTiers)+1))
The test expects audio tier ids to be from the eaf file.

I wonder: isn't that association still important? Perhaps you needed an ordinal index into the startAndStopTimes table. Wouldn't the row number of the table implicitly provide that?

(I know I will seem like a broken record, stuck in a vinyl groove endlessly and annoyingly repeating the same plea. But if you will indulge me, may I try again to persuade you that unit tests are central to the software development process? Not a follow-on step, but an indispensable part of the moment-by-moment work of writing code?)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/paul-shannon/slexil/issues/37?email_source=notifications&email_token=AKN4HBROCKLUUUYXO24NS33P4IEP3A5CNFSM4HVXO3T2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYQBN2A#issuecomment-505419496, or mute the thread https://github.com/notifications/unsubscribe-auth/AKN4HBWG6C3DEERYFQZWY4LP4IEP3ANCNFSM4HVXO3TQ.

paul-shannon commented 5 years ago

Hi David,

Could you let me know when the unit tests run again? I find them really useful as I create and test the ELAN-style eaf file for the Harry Moses story.

If you approve, I propose to follow up that task with a similar treatment of any of the other Lushootseed texts you think will benefit from it.

Thanks!

Paul

On Jun 25, 2019, at 7:15 AM, David Beck notifications@github.com wrote:

Hi, Paul

I made that change because the audio tier IDs don’t necessary come in sequential order in the file, so the row number doesn’t always match (e.g., the seventh line in sequence in one of the texts I was testing was “a46” and there was no “a7”). There may be a way to add an ordinal index over and above the tier ID to the Pandas dataframe, but I still haven’t gotten the hang of Pandas so I didn’t know how to do that. As far as I can see, the actual audio tier ID isn’t used anywhere, though we could certainly preserve that association by either 1) adding an ordinal index to the tbl dataframe (which I’d need you to do, or tell me how to do) and tweaking the way Text reads audioExtractor.startStopTable, or 2) by creating a separate dataframe and assigning that to audioExtractor.startStopTable. Otherwise, I can do as you asked and fix the tests in tests/test_AudioExtractor.py (or, rather, prioritize fixing those tests). Whatever you think makes the most sense.

Sorry about the tests/test_AudioExtractor.py not being kept up to date. I hadn’t really noticed it there and as I mentioned before I hadn’t actually run the makefile after working out the changes to text.py because I knew it would just throw up a lot of errors. I was in the process of cleaning those up but ran out of time and had to go off to the field. This should probably all have been on its own branch in git, in retrospect.

David

On Jun 25, 2019, at 6:20 AM, Paul Shannon notifications@github.com wrote:

@davidjamesbeck https://github.com/davidjamesbeck Hi David,

I'll be grateful if you could sync up tests/test_AudioExtractor.py and audioExtractor.py. The latter includes this recent change in the determineStartAndEndTimes method:

audioIDs = [x.attrib["ANNOTATION_ID"] for x in audioTiers]

audioIDs = list(range(1, len(audioTiers)+1)) The test expects audio tier ids to be from the eaf file.

I wonder: isn't that association still important? Perhaps you needed an ordinal index into the startAndStopTimes table. Wouldn't the row number of the table implicitly provide that?

(I know I will seem like a broken record, stuck in a vinyl groove endlessly and annoyingly repeating the same plea. But if you will indulge me, may I try again to persuade you that unit tests are central to the software development process? Not a follow-on step, but an indispensable part of the moment-by-moment work of writing code?)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/paul-shannon/slexil/issues/37?email_source=notifications&email_token=AKN4HBROCKLUUUYXO24NS33P4IEP3A5CNFSM4HVXO3T2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYQBN2A#issuecomment-505419496, or mute the thread https://github.com/notifications/unsubscribe-auth/AKN4HBWG6C3DEERYFQZWY4LP4IEP3ANCNFSM4HVXO3TQ.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

davidjamesbeck commented 5 years ago

Will do. Just to be clear, I will make the unit tests run again using the current signature of the Text() constructor, with the understanding that at some point we’ll find a way to rejig webapp.py to avoid having to pass the startStopTable to Text() as an argument.

After mulling it over for a while, I also propose to revert AudioExtractor to its earlier state and add some code to create a startStopTable in addition to the Pandas dataframe based on the audio tier IDs. That may be a bit clunky, but it preserves the associations of ID to times (which might prove useful someday) and means less tinkering with the code that allows sequential playback. It will also mean that the audioExtractor texts should run as normal right away, once I make this change.

I can do the second fix today, I hope. Working my way through the other tests will take a bit longer—maybe by the end of the weekend. Sound okay?

David

PS Any of the Lushootseed texts would benefit from this—my ambition (which I am realistic enough to know is unlikely to ever be realized) is to get all the texts into ELAN or otherwise time-aligned and add them to the T.M. Hess Collection here at the UofA. What you are proposing is good and useful work.

On Jun 25, 2019, at 9:07 AM, Paul Shannon notifications@github.com wrote:

Hi David,

Could you let me know when the unit tests run again? I find them really useful as I create and test the ELAN-style eaf file for the Harry Moses story.

If you approve, I propose to follow up that task with a similar treatment of any of the other Lushootseed texts you think will benefit from it.

Thanks!

Paul

On Jun 25, 2019, at 7:15 AM, David Beck notifications@github.com wrote:

Hi, Paul

I made that change because the audio tier IDs don’t necessary come in sequential order in the file, so the row number doesn’t always match (e.g., the seventh line in sequence in one of the texts I was testing was “a46” and there was no “a7”). There may be a way to add an ordinal index over and above the tier ID to the Pandas dataframe, but I still haven’t gotten the hang of Pandas so I didn’t know how to do that. As far as I can see, the actual audio tier ID isn’t used anywhere, though we could certainly preserve that association by either 1) adding an ordinal index to the tbl dataframe (which I’d need you to do, or tell me how to do) and tweaking the way Text reads audioExtractor.startStopTable, or 2) by creating a separate dataframe and assigning that to audioExtractor.startStopTable. Otherwise, I can do as you asked and fix the tests in tests/test_AudioExtractor.py (or, rather, prioritize fixing those tests). Whatever you think makes the most sense.

Sorry about the tests/test_AudioExtractor.py not being kept up to date. I hadn’t really noticed it there and as I mentioned before I hadn’t actually run the makefile after working out the changes to text.py because I knew it would just throw up a lot of errors. I was in the process of cleaning those up but ran out of time and had to go off to the field. This should probably all have been on its own branch in git, in retrospect.

David

On Jun 25, 2019, at 6:20 AM, Paul Shannon notifications@github.com wrote:

@davidjamesbeck https://github.com/davidjamesbeck Hi David,

I'll be grateful if you could sync up tests/test_AudioExtractor.py and audioExtractor.py. The latter includes this recent change in the determineStartAndEndTimes method:

audioIDs = [x.attrib["ANNOTATION_ID"] for x in audioTiers]

audioIDs = list(range(1, len(audioTiers)+1)) The test expects audio tier ids to be from the eaf file.

I wonder: isn't that association still important? Perhaps you needed an ordinal index into the startAndStopTimes table. Wouldn't the row number of the table implicitly provide that?

(I know I will seem like a broken record, stuck in a vinyl groove endlessly and annoyingly repeating the same plea. But if you will indulge me, may I try again to persuade you that unit tests are central to the software development process? Not a follow-on step, but an indispensable part of the moment-by-moment work of writing code?)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/paul-shannon/slexil/issues/37?email_source=notifications&email_token=AKN4HBROCKLUUUYXO24NS33P4IEP3A5CNFSM4HVXO3T2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYQBN2A#issuecomment-505419496, or mute the thread https://github.com/notifications/unsubscribe-auth/AKN4HBWG6C3DEERYFQZWY4LP4IEP3ANCNFSM4HVXO3TQ.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/paul-shannon/slexil/issues/37?email_source=notifications&email_token=AKN4HBSMA76DC7PWHRUWILLP4IYE5A5CNFSM4HVXO3T2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYQSDFY#issuecomment-505487767, or mute the thread https://github.com/notifications/unsubscribe-auth/AKN4HBW5YFQIWIE3LFFTTTTP4IYE5ANCNFSM4HVXO3TQ.

paul-shannon commented 5 years ago

Next week for some fixes is in plenty of time.

I’d be really happy to assist with your ambition regarding the TM Hess Collection stories!

Paul

On Jun 25, 2019, at 8:43 AM, David Beck notifications@github.com wrote:

Will do. Just to be clear, I will make the unit tests run again using the current signature of the Text() constructor, with the understanding that at some point we’ll find a way to rejig webapp.py to avoid having to pass the startStopTable to Text() as an argument.

After mulling it over for a while, I also propose to revert AudioExtractor to its earlier state and add some code to create a startStopTable in addition to the Pandas dataframe based on the audio tier IDs. That may be a bit clunky, but it preserves the associations of ID to times (which might prove useful someday) and means less tinkering with the code that allows sequential playback. It will also mean that the audioExtractor texts should run as normal right away, once I make this change.

I can do the second fix today, I hope. Working my way through the other tests will take a bit longer—maybe by the end of the weekend. Sound okay?

David

PS Any of the Lushootseed texts would benefit from this—my ambition (which I am realistic enough to know is unlikely to ever be realized) is to get all the texts into ELAN or otherwise time-aligned and add them to the T.M. Hess Collection here at the UofA. What you are proposing is good and useful work.

On Jun 25, 2019, at 9:07 AM, Paul Shannon notifications@github.com wrote:

Hi David,

Could you let me know when the unit tests run again? I find them really useful as I create and test the ELAN-style eaf file for the Harry Moses story.

If you approve, I propose to follow up that task with a similar treatment of any of the other Lushootseed texts you think will benefit from it.

Thanks!

Paul

On Jun 25, 2019, at 7:15 AM, David Beck notifications@github.com wrote:

Hi, Paul

I made that change because the audio tier IDs don’t necessary come in sequential order in the file, so the row number doesn’t always match (e.g., the seventh line in sequence in one of the texts I was testing was “a46” and there was no “a7”). There may be a way to add an ordinal index over and above the tier ID to the Pandas dataframe, but I still haven’t gotten the hang of Pandas so I didn’t know how to do that. As far as I can see, the actual audio tier ID isn’t used anywhere, though we could certainly preserve that association by either 1) adding an ordinal index to the tbl dataframe (which I’d need you to do, or tell me how to do) and tweaking the way Text reads audioExtractor.startStopTable, or 2) by creating a separate dataframe and assigning that to audioExtractor.startStopTable. Otherwise, I can do as you asked and fix the tests in tests/test_AudioExtractor.py (or, rather, prioritize fixing those tests). Whatever you think makes the most sense.

Sorry about the tests/test_AudioExtractor.py not being kept up to date. I hadn’t really noticed it there and as I mentioned before I hadn’t actually run the makefile after working out the changes to text.py because I knew it would just throw up a lot of errors. I was in the process of cleaning those up but ran out of time and had to go off to the field. This should probably all have been on its own branch in git, in retrospect.

David

On Jun 25, 2019, at 6:20 AM, Paul Shannon notifications@github.com wrote:

@davidjamesbeck https://github.com/davidjamesbeck Hi David,

I'll be grateful if you could sync up tests/test_AudioExtractor.py and audioExtractor.py. The latter includes this recent change in the determineStartAndEndTimes method:

audioIDs = [x.attrib["ANNOTATION_ID"] for x in audioTiers]

audioIDs = list(range(1, len(audioTiers)+1)) The test expects audio tier ids to be from the eaf file.

I wonder: isn't that association still important? Perhaps you needed an ordinal index into the startAndStopTimes table. Wouldn't the row number of the table implicitly provide that?

(I know I will seem like a broken record, stuck in a vinyl groove endlessly and annoyingly repeating the same plea. But if you will indulge me, may I try again to persuade you that unit tests are central to the software development process? Not a follow-on step, but an indispensable part of the moment-by-moment work of writing code?)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/paul-shannon/slexil/issues/37?email_source=notifications&email_token=AKN4HBROCKLUUUYXO24NS33P4IEP3A5CNFSM4HVXO3T2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYQBN2A#issuecomment-505419496, or mute the thread https://github.com/notifications/unsubscribe-auth/AKN4HBWG6C3DEERYFQZWY4LP4IEP3ANCNFSM4HVXO3TQ.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/paul-shannon/slexil/issues/37?email_source=notifications&email_token=AKN4HBSMA76DC7PWHRUWILLP4IYE5A5CNFSM4HVXO3T2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYQSDFY#issuecomment-505487767, or mute the thread https://github.com/notifications/unsubscribe-auth/AKN4HBW5YFQIWIE3LFFTTTTP4IYE5ANCNFSM4HVXO3TQ.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

davidjamesbeck commented 5 years ago

FYI, tests/test_AudioExtractor.py now seems to be working. For some reason it started choking on the hidden files in the directory, but that’s fixed now, too.

David

On Jun 25, 2019, at 10:48 AM, Paul Shannon notifications@github.com wrote:

Next week for some fixes is in plenty of time.

I’d be really happy to assist with your ambition regarding the TM Hess Collection stories!

Paul

On Jun 25, 2019, at 8:43 AM, David Beck notifications@github.com wrote:

Will do. Just to be clear, I will make the unit tests run again using the current signature of the Text() constructor, with the understanding that at some point we’ll find a way to rejig webapp.py to avoid having to pass the startStopTable to Text() as an argument.

After mulling it over for a while, I also propose to revert AudioExtractor to its earlier state and add some code to create a startStopTable in addition to the Pandas dataframe based on the audio tier IDs. That may be a bit clunky, but it preserves the associations of ID to times (which might prove useful someday) and means less tinkering with the code that allows sequential playback. It will also mean that the audioExtractor texts should run as normal right away, once I make this change.

I can do the second fix today, I hope. Working my way through the other tests will take a bit longer—maybe by the end of the weekend. Sound okay?

David

PS Any of the Lushootseed texts would benefit from this—my ambition (which I am realistic enough to know is unlikely to ever be realized) is to get all the texts into ELAN or otherwise time-aligned and add them to the T.M. Hess Collection here at the UofA. What you are proposing is good and useful work.

On Jun 25, 2019, at 9:07 AM, Paul Shannon notifications@github.com wrote:

Hi David,

Could you let me know when the unit tests run again? I find them really useful as I create and test the ELAN-style eaf file for the Harry Moses story.

If you approve, I propose to follow up that task with a similar treatment of any of the other Lushootseed texts you think will benefit from it.

Thanks!

Paul

On Jun 25, 2019, at 7:15 AM, David Beck notifications@github.com wrote:

Hi, Paul

I made that change because the audio tier IDs don’t necessary come in sequential order in the file, so the row number doesn’t always match (e.g., the seventh line in sequence in one of the texts I was testing was “a46” and there was no “a7”). There may be a way to add an ordinal index over and above the tier ID to the Pandas dataframe, but I still haven’t gotten the hang of Pandas so I didn’t know how to do that. As far as I can see, the actual audio tier ID isn’t used anywhere, though we could certainly preserve that association by either 1) adding an ordinal index to the tbl dataframe (which I’d need you to do, or tell me how to do) and tweaking the way Text reads audioExtractor.startStopTable, or 2) by creating a separate dataframe and assigning that to audioExtractor.startStopTable. Otherwise, I can do as you asked and fix the tests in tests/test_AudioExtractor.py (or, rather, prioritize fixing those tests). Whatever you think makes the most sense.

Sorry about the tests/test_AudioExtractor.py not being kept up to date. I hadn’t really noticed it there and as I mentioned before I hadn’t actually run the makefile after working out the changes to text.py because I knew it would just throw up a lot of errors. I was in the process of cleaning those up but ran out of time and had to go off to the field. This should probably all have been on its own branch in git, in retrospect.

David

On Jun 25, 2019, at 6:20 AM, Paul Shannon notifications@github.com wrote:

@davidjamesbeck https://github.com/davidjamesbeck Hi David,

I'll be grateful if you could sync up tests/test_AudioExtractor.py and audioExtractor.py. The latter includes this recent change in the determineStartAndEndTimes method:

audioIDs = [x.attrib["ANNOTATION_ID"] for x in audioTiers]

audioIDs = list(range(1, len(audioTiers)+1)) The test expects audio tier ids to be from the eaf file.

I wonder: isn't that association still important? Perhaps you needed an ordinal index into the startAndStopTimes table. Wouldn't the row number of the table implicitly provide that?

(I know I will seem like a broken record, stuck in a vinyl groove endlessly and annoyingly repeating the same plea. But if you will indulge me, may I try again to persuade you that unit tests are central to the software development process? Not a follow-on step, but an indispensable part of the moment-by-moment work of writing code?)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/paul-shannon/slexil/issues/37?email_source=notifications&email_token=AKN4HBROCKLUUUYXO24NS33P4IEP3A5CNFSM4HVXO3T2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYQBN2A#issuecomment-505419496, or mute the thread https://github.com/notifications/unsubscribe-auth/AKN4HBWG6C3DEERYFQZWY4LP4IEP3ANCNFSM4HVXO3TQ.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/paul-shannon/slexil/issues/37?email_source=notifications&email_token=AKN4HBSMA76DC7PWHRUWILLP4IYE5A5CNFSM4HVXO3T2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYQSDFY#issuecomment-505487767, or mute the thread https://github.com/notifications/unsubscribe-auth/AKN4HBW5YFQIWIE3LFFTTTTP4IYE5ANCNFSM4HVXO3TQ.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/paul-shannon/slexil/issues/37?email_source=notifications&email_token=AKN4HBTBM3KBQTMUGCKJI2LP4JD6DA5CNFSM4HVXO3T2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYQ37XI#issuecomment-505528285, or mute the thread https://github.com/notifications/unsubscribe-auth/AKN4HBU4D6VE4AOTYS5ZWKTP4JD6DANCNFSM4HVXO3TQ.

davidjamesbeck commented 5 years ago

All the unit tests are now working (amazing what a little insomnia can do to keep a project moving along). I also added back in some of the tests that didn’t appear in the makefile, just to see what happens. These are good, too, except for

test_text_inferno test_text_aktzini

I think that the second one never worked and isn’t supposed to since the .eaf file it is based on is an impermissible type (the time-aligned line ≠ the transcription line), but I’m not sure what is up with the first one., it generates the following output

iMac-2:tests David$ python3 test_text_inferno.py --- test_constructor Traceback (most recent call last): File "test_text_inferno.py", line 70, in runTests() File "test_text_inferno.py", line 37, in runTests test_constructor() File "test_text_inferno.py", line 47, in test_constructor assert(tbl.shape == (4,3)) AssertionError

I investigated it a bit and there seems to be something up with either the way the .eaf file is being parsed or the way the output from the parser interacts with Pandas in text.py. You might want to take a look at this if you think it is worthwhile, but I’m not entirely sure what this was supposed to be testing for. The inferno text passes all the other tests and can be run though webapp.py successfully, so this may have been set up to text the output from older modules that doesn’t match up to what we’re doing now(?).

I’ll commit and push all this so you have have a look at it.

David

On Jun 25, 2019, at 4:19 PM, David Beck dbeck@ualberta.ca wrote:

FYI, tests/test_AudioExtractor.py now seems to be working. For some reason it started choking on the hidden files in the directory, but that’s fixed now, too.

David

On Jun 25, 2019, at 10:48 AM, Paul Shannon notifications@github.com wrote:

Next week for some fixes is in plenty of time.

I’d be really happy to assist with your ambition regarding the TM Hess Collection stories!

Paul

On Jun 25, 2019, at 8:43 AM, David Beck notifications@github.com wrote:

Will do. Just to be clear, I will make the unit tests run again using the current signature of the Text() constructor, with the understanding that at some point we’ll find a way to rejig webapp.py to avoid having to pass the startStopTable to Text() as an argument.

After mulling it over for a while, I also propose to revert AudioExtractor to its earlier state and add some code to create a startStopTable in addition to the Pandas dataframe based on the audio tier IDs. That may be a bit clunky, but it preserves the associations of ID to times (which might prove useful someday) and means less tinkering with the code that allows sequential playback. It will also mean that the audioExtractor texts should run as normal right away, once I make this change.

I can do the second fix today, I hope. Working my way through the other tests will take a bit longer—maybe by the end of the weekend. Sound okay?

David

PS Any of the Lushootseed texts would benefit from this—my ambition (which I am realistic enough to know is unlikely to ever be realized) is to get all the texts into ELAN or otherwise time-aligned and add them to the T.M. Hess Collection here at the UofA. What you are proposing is good and useful work.

On Jun 25, 2019, at 9:07 AM, Paul Shannon notifications@github.com wrote:

Hi David,

Could you let me know when the unit tests run again? I find them really useful as I create and test the ELAN-style eaf file for the Harry Moses story.

If you approve, I propose to follow up that task with a similar treatment of any of the other Lushootseed texts you think will benefit from it.

Thanks!

Paul

On Jun 25, 2019, at 7:15 AM, David Beck notifications@github.com wrote:

Hi, Paul

I made that change because the audio tier IDs don’t necessary come in sequential order in the file, so the row number doesn’t always match (e.g., the seventh line in sequence in one of the texts I was testing was “a46” and there was no “a7”). There may be a way to add an ordinal index over and above the tier ID to the Pandas dataframe, but I still haven’t gotten the hang of Pandas so I didn’t know how to do that. As far as I can see, the actual audio tier ID isn’t used anywhere, though we could certainly preserve that association by either 1) adding an ordinal index to the tbl dataframe (which I’d need you to do, or tell me how to do) and tweaking the way Text reads audioExtractor.startStopTable, or 2) by creating a separate dataframe and assigning that to audioExtractor.startStopTable. Otherwise, I can do as you asked and fix the tests in tests/test_AudioExtractor.py (or, rather, prioritize fixing those tests). Whatever you think makes the most sense.

Sorry about the tests/test_AudioExtractor.py not being kept up to date. I hadn’t really noticed it there and as I mentioned before I hadn’t actually run the makefile after working out the changes to text.py because I knew it would just throw up a lot of errors. I was in the process of cleaning those up but ran out of time and had to go off to the field. This should probably all have been on its own branch in git, in retrospect.

David

On Jun 25, 2019, at 6:20 AM, Paul Shannon notifications@github.com wrote:

@davidjamesbeck https://github.com/davidjamesbeck Hi David,

I'll be grateful if you could sync up tests/test_AudioExtractor.py and audioExtractor.py. The latter includes this recent change in the determineStartAndEndTimes method:

audioIDs = [x.attrib["ANNOTATION_ID"] for x in audioTiers]

audioIDs = list(range(1, len(audioTiers)+1)) The test expects audio tier ids to be from the eaf file.

I wonder: isn't that association still important? Perhaps you needed an ordinal index into the startAndStopTimes table. Wouldn't the row number of the table implicitly provide that?

(I know I will seem like a broken record, stuck in a vinyl groove endlessly and annoyingly repeating the same plea. But if you will indulge me, may I try again to persuade you that unit tests are central to the software development process? Not a follow-on step, but an indispensable part of the moment-by-moment work of writing code?)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/paul-shannon/slexil/issues/37?email_source=notifications&email_token=AKN4HBROCKLUUUYXO24NS33P4IEP3A5CNFSM4HVXO3T2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYQBN2A#issuecomment-505419496, or mute the thread https://github.com/notifications/unsubscribe-auth/AKN4HBWG6C3DEERYFQZWY4LP4IEP3ANCNFSM4HVXO3TQ.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/paul-shannon/slexil/issues/37?email_source=notifications&email_token=AKN4HBSMA76DC7PWHRUWILLP4IYE5A5CNFSM4HVXO3T2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYQSDFY#issuecomment-505487767, or mute the thread https://github.com/notifications/unsubscribe-auth/AKN4HBW5YFQIWIE3LFFTTTTP4IYE5ANCNFSM4HVXO3TQ.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

paul-shannon commented 5 years ago

@davidjamesbeck - great work, David! It's beautiful to see all the tests pass.

I corrected two mistakes I had made in the inferno 3-line demo:

the tierGuide omitted the "morphemePacking" line
the translation tier had a fourth line, left over from the daylight eaf file which I had cribbed from

Now test_text_inferno.py passes and is in the makefile.

This sets me up nicely to build html from the command line for How Daylight Was Stolen. You may recall that I start with a simple yaml file

explorations/generateEAF/pythonDemos/daylight1/daylight.yaml

which I find much easier touse than hand-edited eaf or interactive ELAN. From that yaml file I generate an eaf file at the command line, using toEAF.py, and now I can create the finished webpage as well. Thank you!

davidjamesbeck commented 5 years ago

This sets me up nicely to build html from the command line for How Daylight Was Stolen. You may recall that I start with a simple yaml file

explorations/generateEAF/pythonDemos/daylight1/daylight.yaml

which I find much easier touse than hand-edited eaf or interactive ELAN. From that yaml file I generate an eaf file at the command line, using toEAF.py, and now I can create the finished webpage as well. Thank you!

What is the input to this? You have a yams file but what is the linguistic data formatted in? It must be some sort of time-aligned transcript. Not .eaf as well?

David

paul-shannon commented 5 years ago

davidjamesbeck commented 5 years ago

So you build this manually? That might be slow—check out Transcriber (http://trans.sourceforge.net/en/presentation.php http://trans.sourceforge.net/en/presentation.php). People used to use it a lot, though their page looks a bit old. Anyway, Transcriber lets you elect stretches of speech and output the timecodes to a .txt file, which at least avoids having to type out each and every timecode. I can ask around if there is something newer that people use (maybe Saymore —https://software.sil.org/saymore/ https://software.sil.org/saymore/ ?)

David

On Jun 27, 2019, at 9:03 AM, Paul Shannon notifications@github.com wrote:

https://user-images.githubusercontent.com/2480712/60277243-0959ee00-98b2-11e9-89ca-ae25ea67b993.png — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/paul-shannon/slexil/issues/37?email_source=notifications&email_token=AKN4HBUHVT6OGG3WCTP5THLP4TJE5A5CNFSM4HVXO3T2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYXNCLY#issuecomment-506384687, or mute the thread https://github.com/notifications/unsubscribe-auth/AKN4HBSM4JSFIOQXUYUB3MLP4TJE5ANCNFSM4HVXO3TQ.

paul-shannon commented 5 years ago

I do build it manually. It goes really fast - at least that is the case when I have the interlinear document on screen, as I do with your and Thom’s 2007 Daylight. Just cut and paste, find the end of the next audio line.

As for finding those times, I have a simple audio editing tool, amadeus, which I have long used to decipher fiddle tunes. It reports times rather like saymore does. Quick and easy.

The broader context here might be worth explaining. As a programmer I pretty much spend all day typing. I use the venerable programmer’s editor, emacs, for almost everything I do - and almost everything I do is typing text. I produce interactive visualizations, but I mostly write text.

So this yaml file - which I subsequently transform to ELAN xml - grows at the rate of about a line every minute or two. Of course my yaml-to-eaf python script is only a work in progress, and fixing its bugs slows me down a bit. But for the glorified typist I am, this approach should permit me to create ELAN eaf versions of Lushootseed texts at a pretty good clip.

Paul

On Jun 27, 2019, at 8:56 AM, David Beck notifications@github.com wrote:

So you build this manually? That might be slow—check out Transcriber (http://trans.sourceforge.net/en/presentation.php http://trans.sourceforge.net/en/presentation.php). People used to use it a lot, though their page looks a bit old. Anyway, Transcriber lets you elect stretches of speech and output the timecodes to a .txt file, which at least avoids having to type out each and every timecode. I can ask around if there is something newer that people use (maybe Saymore —https://software.sil.org/saymore/ https://software.sil.org/saymore/ ?)

David

On Jun 27, 2019, at 9:03 AM, Paul Shannon notifications@github.com wrote:

https://user-images.githubusercontent.com/2480712/60277243-0959ee00-98b2-11e9-89ca-ae25ea67b993.png — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/paul-shannon/slexil/issues/37?email_source=notifications&email_token=AKN4HBUHVT6OGG3WCTP5THLP4TJE5A5CNFSM4HVXO3T2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYXNCLY#issuecomment-506384687, or mute the thread https://github.com/notifications/unsubscribe-auth/AKN4HBSM4JSFIOQXUYUB3MLP4TJE5ANCNFSM4HVXO3TQ.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or mute the thread.

davidjamesbeck commented 5 years ago

Fair enough. We’ll see how you feel after doing Susie Sampson Peter’s “Starchild” story :-). I hate typing, probably the reason I stared to learn coding as much as anything—coding is typing, but a program to generate a dictionary from a database is less typing than doing it by hand.

But do let me know if you suspect that there might be tools linguists use for some of the tasks you’re taking on.

David

On Jun 27, 2019, at 10:50 AM, Paul Shannon notifications@github.com wrote:

I do build it manually. It goes really fast - at least that is the case when I have the interlinear document on screen, as I do with your and Thom’s 2007 Daylight. Just cut and paste, find the end of the next audio line.

As for finding those times, I have a simple audio editing tool, amadeus, which I have long used to decipher fiddle tunes. It reports times rather like saymore does. Quick and easy.

The broader context here might be worth explaining. As a programmer I pretty much spend all day typing. I use the venerable programmer’s editor, emacs, for almost everything I do - and almost everything I do is typing text. I produce interactive visualizations, but I mostly write text.

So this yaml file - which I subsequently transform to ELAN xml - grows at the rate of about a line every minute or two. Of course my yaml-to-eaf python script is only a work in progress, and fixing its bugs slows me down a bit. But for the glorified typist I am, this approach should permit me to create ELAN eaf versions of Lushootseed texts at a pretty good clip.

Paul

On Jun 27, 2019, at 8:56 AM, David Beck notifications@github.com wrote:

So you build this manually? That might be slow—check out Transcriber (http://trans.sourceforge.net/en/presentation.php http://trans.sourceforge.net/en/presentation.php). People used to use it a lot, though their page looks a bit old. Anyway, Transcriber lets you elect stretches of speech and output the timecodes to a .txt file, which at least avoids having to type out each and every timecode. I can ask around if there is something newer that people use (maybe Saymore —https://software.sil.org/saymore/ https://software.sil.org/saymore/ ?)

David

On Jun 27, 2019, at 9:03 AM, Paul Shannon notifications@github.com wrote:

https://user-images.githubusercontent.com/2480712/60277243-0959ee00-98b2-11e9-89ca-ae25ea67b993.png — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/paul-shannon/slexil/issues/37?email_source=notifications&email_token=AKN4HBUHVT6OGG3WCTP5THLP4TJE5A5CNFSM4HVXO3T2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYXNCLY#issuecomment-506384687, or mute the thread https://github.com/notifications/unsubscribe-auth/AKN4HBSM4JSFIOQXUYUB3MLP4TJE5ANCNFSM4HVXO3TQ.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/paul-shannon/slexil/issues/37?email_source=notifications&email_token=AKN4HBUH74OLTKROCTVTLDLP4TVXFA5CNFSM4HVXO3T2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYXXEXI#issuecomment-506425949, or mute the thread https://github.com/notifications/unsubscribe-auth/AKN4HBSVPFTCSKPXO7K547LP4TVXFANCNFSM4HVXO3TQ.